Files
deep_pro_judge/opus47_1m/kv/PROMPT.md
T
sleepy 45c3aad453 feat: expand to 6 models, 8 challenges; rewrite README with DeepSeek V4 Pro analysis
- Add Claude Opus 4.7, Kimi K2.6, GLM-5.1 to existing GLM-5, Qwen3-6, MiniMax-M2.7
- Add 5 new challenges: flash attention fwd/bwd, beam search, DFlash, ternary training
- Rewrite README with TL;DR rankings, grade matrix, and DeepSeek V4 Pro attribution
- Add analysis/ folder with cross-model comparisons and per-challenge deep dives
- Add deploy_challenges.sh script
- Expand .gitignore to exclude Python envs, ML weights, and build artifacts
2026-04-27 18:49:22 +02:00

648 B

Implement an efficient KV-cache system for autoregressive transformer inference from scratch.

Requirements:

  1. Support incremental decoding (one token at a time).
  2. Avoid recomputing attention for past tokens.
  3. Handle:
    • multi-head attention
    • batching with variable sequence lengths
  4. Provide:
    • data structure layout (memory format)
    • update logic per step
    • attention computation using cached keys/values

Additionally:

  • Analyze memory growth over long sequences.
  • Propose at least two optimizations (e.g., paged attention, chunking, compression).
  • Explain how this would map to GPU execution.

Do not use any frameworks.