feat: add model comparisons and sanitize session files
- Rename gamma to glm5 and model to minimax-m2.7 - Add model_comparison/ directory with head-to-head analyses - Sanitize all session.jsonl files: remove absolute paths and usernames - Remove __pycache__ artifacts - Add .gitignore
This commit is contained in:
@@ -0,0 +1,19 @@
|
||||
Implement an efficient KV-cache system for autoregressive transformer inference from scratch.
|
||||
|
||||
Requirements:
|
||||
1. Support incremental decoding (one token at a time).
|
||||
2. Avoid recomputing attention for past tokens.
|
||||
3. Handle:
|
||||
- multi-head attention
|
||||
- batching with variable sequence lengths
|
||||
4. Provide:
|
||||
- data structure layout (memory format)
|
||||
- update logic per step
|
||||
- attention computation using cached keys/values
|
||||
|
||||
Additionally:
|
||||
- Analyze memory growth over long sequences.
|
||||
- Propose at least two optimizations (e.g., paged attention, chunking, compression).
|
||||
- Explain how this would map to GPU execution.
|
||||
|
||||
Do not use any frameworks.
|
||||
Reference in New Issue
Block a user