llm_programming_tests/minimax-m2.7/fuse/PROMPT.md at 8e72eef09cd391a59292f22fbe1d830d2a2d39f5

Files

T

sleepy 8e72eef09c feat: add model comparisons and sanitize session files

- Rename gamma to glm5 and model to minimax-m2.7
- Add model_comparison/ directory with head-to-head analyses
- Sanitize all session.jsonl files: remove absolute paths and usernames
- Remove __pycache__ artifacts
- Add .gitignore

2026-04-23 11:16:01 +02:00

676 B

Raw Blame History

Design and implement a high-performance fused softmax + top-k kernel in CUDA (or CUDA-like pseudocode).

Requirements:

Input: logits [B, T, V]
Output:
- top-k indices per (B, T)
- top-k probabilities (after softmax)

Constraints:

Do NOT materialize the full softmax matrix in global memory.
Must be numerically stable (log-sum-exp).
Minimize global memory reads/writes.
Use shared memory where appropriate.
Handle large V (e.g., 50k+) efficiently.

Deliver:

Kernel pseudocode or CUDA code
Memory access pattern explanation
Warp-level optimization strategy
Complexity analysis (bandwidth vs compute bound)
Comparison to naive implementation

676 B Raw Blame History

676 B

Raw Blame History