feat: add model comparisons and sanitize session files

- Rename gamma to glm5 and model to minimax-m2.7
- Add model_comparison/ directory with head-to-head analyses
- Sanitize all session.jsonl files: remove absolute paths and usernames
- Remove __pycache__ artifacts
- Add .gitignore
This commit is contained in:
2026-04-23 11:16:01 +02:00
commit 8e72eef09c
62 changed files with 18469 additions and 0 deletions
+21
View File
@@ -0,0 +1,21 @@
Design and implement a high-performance fused softmax + top-k kernel in CUDA (or CUDA-like pseudocode).
Requirements:
- Input: logits [B, T, V]
- Output:
- top-k indices per (B, T)
- top-k probabilities (after softmax)
Constraints:
1. Do NOT materialize the full softmax matrix in global memory.
2. Must be numerically stable (log-sum-exp).
3. Minimize global memory reads/writes.
4. Use shared memory where appropriate.
5. Handle large V (e.g., 50k+) efficiently.
Deliver:
- Kernel pseudocode or CUDA code
- Memory access pattern explanation
- Warp-level optimization strategy
- Complexity analysis (bandwidth vs compute bound)
- Comparison to naive implementation