feat: add model comparisons and sanitize session files

- Rename gamma to glm5 and model to minimax-m2.7 - Add model_comparison/ directory with head-to-head analyses - Sanitize all session.jsonl files: remove absolute paths and usernames - Remove __pycache__ artifacts - Add .gitignore
2026-04-23 11:16:01 +02:00
commit 8e72eef09c
62 changed files with 18469 additions and 0 deletions
@@ -0,0 +1,19 @@
+Implement an efficient KV-cache system for autoregressive transformer inference from scratch.
+
+Requirements:
+1. Support incremental decoding (one token at a time).
+2. Avoid recomputing attention for past tokens.
+3. Handle:
+   - multi-head attention
+   - batching with variable sequence lengths
+4. Provide:
+   - data structure layout (memory format)
+   - update logic per step
+   - attention computation using cached keys/values
+
+Additionally:
+- Analyze memory growth over long sequences.
+- Propose at least two optimizations (e.g., paged attention, chunking, compression).
+- Explain how this would map to GPU execution.
+
+Do not use any frameworks.