- D 55.5%
- Rust 40.6%
- Metal 2.8%
- Makefile 1%
| Filename | Latest commit message | Latest commit date |
|---|---|---|
- e2e.rs: comprehensive E2E tests with real model loading - dispatch.rs: Metal kernel dispatch primitives (matmul, rms_norm) - config.rs: proper serde defaults for rope_theta, sliding_window - model.rs: Metal buffer management, forward pass with dispatch - weight_loader.rs: improved weight loading with dtype conversion - parser.rs: improved safetensors shard resolution - 156 tests pass, clippy clean |
||
| src | ||
| target | ||
| .gitignore | ||
| AGENTS.md | ||
| build.rs | ||
| Cargo.lock | ||
| Cargo.toml | ||
| PROJECT.md | ||
| README.md | ||
| WIKI.md | ||
rust-llm
A Rust continuation of sleepy-llm — a ground-up inference engine for Apple Silicon. Target: beat MLX performance.
Core idea: Same architecture as the Zig project, rebuilt in Rust. Skip the Python/MLX overhead and the underperforming multi-platform engines. Write a focused inference engine with hand-tuned Metal Shading Language kernels, mmap model weights into unified memory, and dispatch directly to the GPU via MTLCommandBuffer.
Why Rust: The Zig project proved the kernel fusion and memory architecture. Rust gives us serde, metal-rs, and a mature ecosystem for safetensors parsing and tokenizer handling — without sacrificing the zero-copy, direct-to-metal design. If Zig's comptime was the right tool for shape checking, Rust's const generics and type system are the equivalent here.
Status: Early architecture. Ported from Zig design docs. Building toward Qwen3.5-4B support with Multi-Token Prediction (MTP).
Stack: Rust (latest stable), Metal 3, MSL, metal-rs. No Python. No Vulkan. No MLX dependency.
Model format: Safetensors (initially). We use MLX-optimized safetensors for fair baseline comparison against MLX. GGUF may be added later for broader compatibility.
Test model: Qwen3.5-4B with verified MTP layers (15 MTP tensors confirmed, mtp_num_hidden_layers: 1 in config).
Build
cargo build --release
Test
cargo test
Lint
cargo clippy --all-targets --all-features && cargo fmt --check
Architecture
src/metal/— Metal GPU backend (context, buffers, pipelines, kernels)src/tensor/— Generic tensor system with static shape checkingsrc/safetensors/— Safetensors parser and zero-copy loadersrc/models/— Model implementations (qwen3_5 reference)src/inference/— Inference engine, sampling, scheduling, MTPsrc/platform/— Apple Silicon feature detectionsrc/tests/— End-to-end tests