Activity - llama.cpp - Sleepy Git

sleepy/llama.cpp

2 Active Pull Requests

11 Active Issues

1
Merged Pull Request 1
Proposed Pull Request 3
Closed Issues 11
New Issues

Excluding merges, 2 authors have pushed 2 commits to master and 4 commits to all branches. On master, 7 files have changed and there have been 811 additions and 19 deletions.

1 Pull request merged by 1 user

Merged #38 [metal] extend bin op fusion to MUL/SUB/DIV chains (#28) 2026-04-30 21:03:15 +02:00

1 Pull request proposed by 1 user

Proposed #39 [metal] wire contiguous Q4_0 kernel into dispatch (#29) 2026-04-30 22:39:54 +02:00

3 Issues closed from 1 user

Closed #33 IQ4_XS tg4096 anomaly (45 vs 76 tok/s on 4B) 2026-04-30 20:17:37 +02:00

Closed #27 Eliminate zero-ops (VIEW/RESHAPE/TRANSPOSE/PERMUTE) 2026-04-30 20:17:28 +02:00

Closed #28 Reduce GPU dispatch count (1151 per tick) 2026-04-30 20:17:15 +02:00

11 Issues created by 1 user

Opened #27 Eliminate zero-ops (VIEW/RESHAPE/TRANSPOSE/PERMUTE) 2026-04-30 18:11:33 +02:00

Opened #28 Reduce GPU dispatch count (1151 per tick) 2026-04-30 18:11:34 +02:00

Opened #29 Port contiguous weight reads to Q4_0 MUL_MAT kernel 2026-04-30 18:11:34 +02:00

Opened #30 Investigate GET_ROWS overhead (678 MB/tick at 9B) 2026-04-30 18:11:35 +02:00

Opened #31 Investigate CPY overhead (159 MB/tick at 9B) 2026-04-30 18:11:35 +02:00

Opened #32 KV cache IO scaling with context length 2026-04-30 18:11:35 +02:00

Opened #33 IQ4_XS tg4096 anomaly (45 vs 76 tok/s on 4B) 2026-04-30 18:11:36 +02:00

Opened #34 Profile graph fusion effectiveness 2026-04-30 18:11:36 +02:00

Opened #35 Profile concurrent encoding effectiveness 2026-04-30 18:11:37 +02:00

Opened #36 Compare llama.cpp and MLX dispatch structure 2026-04-30 18:11:37 +02:00

Opened #37 Implement MXFP4 GGUF converter 2026-04-30 18:11:37 +02:00