llm_programming_tests/qwen36/kv/PROMPT.md at 8e72eef09cd391a59292f22fbe1d830d2a2d39f5

Files

T

sleepy 8e72eef09c feat: add model comparisons and sanitize session files

- Rename gamma to glm5 and model to minimax-m2.7
- Add model_comparison/ directory with head-to-head analyses
- Sanitize all session.jsonl files: remove absolute paths and usernames
- Remove __pycache__ artifacts
- Add .gitignore

2026-04-23 11:16:01 +02:00

647 B

Raw Blame History

Implement an efficient KV-cache system for autoregressive transformer inference from scratch.

Requirements:

Support incremental decoding (one token at a time).
Avoid recomputing attention for past tokens.
Handle:
- multi-head attention
- batching with variable sequence lengths
Provide:
- data structure layout (memory format)
- update logic per step
- attention computation using cached keys/values

Additionally:

Analyze memory growth over long sequences.
Propose at least two optimizations (e.g., paged attention, chunking, compression).
Explain how this would map to GPU execution.

Do not use any frameworks.

647 B Raw Blame History

647 B

Raw Blame History