Activity - llama.cpp - Sleepy Git

sleepy/llama.cpp

2 Active Pull Requests

11 Active Issues

1
Merged Pull Request 1
Proposed Pull Request 3
Closed Issues 11
New Issues

Excluding merges, 447 authors have pushed 2089 commits to master and 2091 commits to all branches. On master, 2055 files have changed and there have been 714212 additions and 322031 deletions.

1 Pull request merged by 1 user

Merged #38 [metal] extend bin op fusion to MUL/SUB/DIV chains (#28) 2026-04-30 21:03:15 +02:00

1 Pull request proposed by 1 user

Proposed #39 [metal] wire contiguous Q4_0 kernel into dispatch (#29) 2026-04-30 22:39:54 +02:00

3 Issues closed from 1 user

Closed #33 IQ4_XS tg4096 anomaly (45 vs 76 tok/s on 4B) 2026-04-30 20:17:37 +02:00

Closed #27 Eliminate zero-ops (VIEW/RESHAPE/TRANSPOSE/PERMUTE) 2026-04-30 20:17:28 +02:00

Closed #28 Reduce GPU dispatch count (1151 per tick) 2026-04-30 20:17:15 +02:00

11 Issues created by 1 user

Opened #27 Eliminate zero-ops (VIEW/RESHAPE/TRANSPOSE/PERMUTE) 2026-04-30 18:11:33 +02:00

Opened #28 Reduce GPU dispatch count (1151 per tick) 2026-04-30 18:11:34 +02:00

Opened #29 Port contiguous weight reads to Q4_0 MUL_MAT kernel 2026-04-30 18:11:34 +02:00

Opened #30 Investigate GET_ROWS overhead (678 MB/tick at 9B) 2026-04-30 18:11:35 +02:00

Opened #31 Investigate CPY overhead (159 MB/tick at 9B) 2026-04-30 18:11:35 +02:00

Opened #32 KV cache IO scaling with context length 2026-04-30 18:11:35 +02:00

Opened #33 IQ4_XS tg4096 anomaly (45 vs 76 tok/s on 4B) 2026-04-30 18:11:36 +02:00

Opened #34 Profile graph fusion effectiveness 2026-04-30 18:11:36 +02:00

Opened #35 Profile concurrent encoding effectiveness 2026-04-30 18:11:37 +02:00

Opened #36 Compare llama.cpp and MLX dispatch structure 2026-04-30 18:11:37 +02:00

Opened #37 Implement MXFP4 GGUF converter 2026-04-30 18:11:37 +02:00