Files
llm_programming_tests/minimax-m2.7/fuse/PROMPT.md
T
sleepy 8e72eef09c feat: add model comparisons and sanitize session files
- Rename gamma to glm5 and model to minimax-m2.7
- Add model_comparison/ directory with head-to-head analyses
- Sanitize all session.jsonl files: remove absolute paths and usernames
- Remove __pycache__ artifacts
- Add .gitignore
2026-04-23 11:16:01 +02:00

676 B

Design and implement a high-performance fused softmax + top-k kernel in CUDA (or CUDA-like pseudocode).

Requirements:

  • Input: logits [B, T, V]
  • Output:
    • top-k indices per (B, T)
    • top-k probabilities (after softmax)

Constraints:

  1. Do NOT materialize the full softmax matrix in global memory.
  2. Must be numerically stable (log-sum-exp).
  3. Minimize global memory reads/writes.
  4. Use shared memory where appropriate.
  5. Handle large V (e.g., 50k+) efficiently.

Deliver:

  • Kernel pseudocode or CUDA code
  • Memory access pattern explanation
  • Warp-level optimization strategy
  • Complexity analysis (bandwidth vs compute bound)
  • Comparison to naive implementation