feat: add model comparisons and sanitize session files
- Rename gamma to glm5 and model to minimax-m2.7 - Add model_comparison/ directory with head-to-head analyses - Sanitize all session.jsonl files: remove absolute paths and usernames - Remove __pycache__ artifacts - Add .gitignore
This commit is contained in:
@@ -0,0 +1,21 @@
|
||||
Design and implement a high-performance fused softmax + top-k kernel in CUDA (or CUDA-like pseudocode).
|
||||
|
||||
Requirements:
|
||||
- Input: logits [B, T, V]
|
||||
- Output:
|
||||
- top-k indices per (B, T)
|
||||
- top-k probabilities (after softmax)
|
||||
|
||||
Constraints:
|
||||
1. Do NOT materialize the full softmax matrix in global memory.
|
||||
2. Must be numerically stable (log-sum-exp).
|
||||
3. Minimize global memory reads/writes.
|
||||
4. Use shared memory where appropriate.
|
||||
5. Handle large V (e.g., 50k+) efficiently.
|
||||
|
||||
Deliver:
|
||||
- Kernel pseudocode or CUDA code
|
||||
- Memory access pattern explanation
|
||||
- Warp-level optimization strategy
|
||||
- Complexity analysis (bandwidth vs compute bound)
|
||||
- Comparison to naive implementation
|
||||
Reference in New Issue
Block a user