Files
intel-gpu-llm-diagnosis/repos/patch/phase2-sycl-kernel/README.md
T
sleepy 6ad84d543c feat: phased patch system for Intel Arc GPU performance fixes
3-model council (GLM-5.1, Minimax-M2.7, Kimi k2p5) analyzed Intel Arc GPU
performance issues and produced patches for llama.cpp:

Phase 1 - SYCL Sync: Enable graph execution by default (GGML_SYCL_DISABLE_GRAPH)
Phase 2 - SYCL Kernel: Fix VER_GEN12/13 thresholds, tune DMMV_X/MMV_Y
Phase 3 - Vulkan Intel: Arc 140T device-ID Xe2 override

Includes:
- Phased apply script (apply-phase.sh [1|2|3|all])
- Master apply script with --status/--reverse/--dry-run
- Per-phase READMEs with testing checklists
- Council deliberation logs (gitignored in logs/)

Verified: all patches apply/reverse cleanly via git apply.
Static verification: VER_GEN arithmetic and DMMV_X divisibility pass.
2026-04-15 14:53:40 +02:00

38 lines
1.3 KiB
Markdown

# Phase 2 — SYCL Kernel Tuning
**Depends on:** Phase 1 (should be applied and tested first)
## 0001-fix-ver-gen-thresholds.patch
Fixes VER_GEN12 (1,000,000 → 1,200) and VER_GEN13 (1,001,030 → 1,300).
The original VER_GEN12 value was an unreachable placeholder that caused all Intel
Arc GPUs (cc≈1255 for A770) to fall through to the NVIDIA Ampere tuning path in
all MMQ kernels. After this patch, Intel discrete GPUs use the VER_GEN12 path.
## 0002-tune-dmmv-xy-for-arc.patch
Changes presets.hpp: DMMV_X 32→64, MMV_Y 1→2.
Doubles the data processed per thread in DMMV kernels and doubles rows per
work-group. All common model widths (4096-14336) are divisible by 64.
## 0003-tune-dmmv-xy-common-hpp.patch
Same changes as 0002 but in common.hpp (duplicate definitions).
### Expected impact
5-15% additional improvement on top of Phase 1.
### ⚠️ Needs Benchmarking
DMMV_X=64 and MMV_Y=2 were chosen analytically, not empirically. If MMV_Y=2
causes register spills (check with `GGML_SYCL_DEBUG=1`), revert 0002+0003 and
try DMMV_X=64 with MMV_Y=1 only.
### Testing checklist
- [ ] Build succeeds
- [ ] Unit tests pass
- [ ] Dense model inference produces correct output
- [ ] No assertion failures (`ncols % GGML_SYCL_DMMV_X == 0`)
- [ ] Benchmark comparison vs Phase 1 only