Round 1 Summary: MiniMax-M2.7 vs Qwen3.6-27B

Overall Scoreboard

Task	MiniMax-M2.7	Qwen3.6-27B	Winner	Margin
KV Cache	64/100	91/100	qwen36	+27
Backwards Pass	76/100	92/100	qwen36	+16
Fused Softmax+TopK	58/100	88/100	qwen36	+30
Average	66	90	qwen36	+24

Clear winner: Qwen3.6-27B — dominant across all 3 tasks.

Task 1: KV Cache System

Dimension	MiniMax-M2.7	Qwen3.6-27B
Correctness	55	92
Completeness	75	95
Code Quality	60	88
Depth of Analysis	78	90
Optimizations	72	90
GPU Mapping	75	88
Tests/Demos	30	95
Overall	64	91

MiniMax-M2.7 Critical Issues

Inverted causal mask — masks the wrong triangle, allowing attention to future tokens
Broken batched caching — all batch elements share the same kv_cache dict keyed only by layer, not by batch item
Prefill doesn't store KV — prefill KV tensors never stored in persistent cache
No tests — only a 3-step hardcoded demo with zero assertions
1,720-line monolith — everything crammed into one file

Qwen3.6-27B Strengths

10 passing demos with numerical validation (cached attention diff < 1e-5, chunked prefill diff = 4.56e-10)
Modular 7-file architecture — clean separation of concerns
Correct variable-length batching — proper causal + length masks
3 working optimizations — paged attention, int8 quantization, chunked prefill (all tested)
Quantitative analysis — arithmetic intensity calculations, per-GPU context limits, real model comparisons (Llama, Mistral, GPT-4)

Task 2: Layer Norm Backward Pass

Dimension	MiniMax-M2.7	Qwen3.6-27B
Correctness	85	95
Completeness	80	95
Code Quality	70	90
Numerical Stability	75	95
Gradient Check	80	90
Complexity Analysis	80	90
GPU Fusion	85	85
Tests/Benchmarks	60	95
Overall	76	92

MiniMax-M2.7 Weaknesses

Over-caching: Stores 10 cache items when only 3 tensors are needed
No edge-case tests: No tests for zero input, D=1, large offsets
No concrete stability demo: Discusses catastrophic cancellation but never demonstrates it
Monolithic 750-line file: Everything mixed together
Fragile gradient check: Modifies input in-place without a copy

Qwen3.6-27B Strengths

Minimal cache: Only 4 items (x_hat, std_inv, glm5, D) — exactly what's needed
Concrete stability demo: Shows naive variance fails at offset=1e8 while two-pass stays exact
3-file separation: Core + tests + benchmarks
Edge-case tests: Zero input, D=1, large D (1024), large mean, scale invariance
Alternative derivation cross-check: Independent step-by-step chain rule verifies compact formula (<1e-10 error)

Task 3: Fused Softmax + TopK CUDA

Dimension	MiniMax-M2.7	Qwen3.6-27B
Correctness	40	95
Completeness	65	90
Code Quality	60	85
CUDA Depth	65	92
Memory Design	55	90
Complexity Analysis	60	88
Naive Comparison	55	88
Overall	58	88

MiniMax-M2.7 Critical Issues

Broken inter-warp top-k merge: Only ~100 of 256 threads contribute to final merge; 156 threads' results silently discarded → produces incorrect top-k
Compilation-stopping typo: topp_prob instead of topk_prob
Misleading bandwidth claims: Claims "4× reduction" but only counts one of three passes
Zero testing infrastructure: No benchmark harness, no CPU reference, no correctness verification

Qwen3.6-27B Strengths

Two kernel versions (v1 + optimized v2 with vectorized float4 loads)
Correct warp-by-warp merge — properly collects all 4096 candidates
Shared-memory min-heap for O(log K) insertions
Complete benchmark harness with CPU reference and correctness tests
Honest 3-pass bandwidth analysis — correctly identifies kernel as compute-bound (expf throughput)

What Separated These Two

Factor	MiniMax-M2.7	Qwen3.6-27B
Correctness	Buggy in all 3 tasks	Correct in all 3
Testing	None / minimal	Comprehensive with assertions
Analysis depth	High-level / conceptual	Quantitative with real numbers
Code organization	Monolithic	Modular and focused
Engineering rigor	Claims untested	Every claim validated

The decisive pattern: MiniMax-M2.7 was conceptually broad but executionally weak — it mentioned many optimizations and ideas but delivered buggy, untested code. Qwen3.6-27B was narrower in scope but flawlessly executed — every claim backed by working, validated code.

4.9 KiB Raw Blame History Unescape Escape

Round 1 Summary: MiniMax-M2.7 vs Qwen3.6-27B

Overall Scoreboard

Task 1: KV Cache System

MiniMax-M2.7 Critical Issues

Qwen3.6-27B Strengths

Task 2: Layer Norm Backward Pass

MiniMax-M2.7 Weaknesses

Qwen3.6-27B Strengths

Task 3: Fused Softmax + TopK CUDA

MiniMax-M2.7 Critical Issues

Qwen3.6-27B Strengths

What Separated These Two

4.9 KiB

Raw Blame History