feat: add model comparisons and sanitize session files

- Rename gamma to glm5 and model to minimax-m2.7 - Add model_comparison/ directory with head-to-head analyses - Sanitize all session.jsonl files: remove absolute paths and usernames - Remove __pycache__ artifacts - Add .gitignore
2026-04-23 11:16:01 +02:00
commit 8e72eef09c
62 changed files with 18469 additions and 0 deletions
@@ -0,0 +1,21 @@
+Design and implement a high-performance fused softmax + top-k kernel in CUDA (or CUDA-like pseudocode).
+
+Requirements:
+- Input: logits [B, T, V]
+- Output:
+    - top-k indices per (B, T)
+    - top-k probabilities (after softmax)
+
+Constraints:
+1. Do NOT materialize the full softmax matrix in global memory.
+2. Must be numerically stable (log-sum-exp).
+3. Minimize global memory reads/writes.
+4. Use shared memory where appropriate.
+5. Handle large V (e.g., 50k+) efficiently.
+
+Deliver:
+- Kernel pseudocode or CUDA code
+- Memory access pattern explanation
+- Warp-level optimization strategy
+- Complexity analysis (bandwidth vs compute bound)
+- Comparison to naive implementation