Files

T

sleepy 6ad84d543c feat: phased patch system for Intel Arc GPU performance fixes

3-model council (GLM-5.1, Minimax-M2.7, Kimi k2p5) analyzed Intel Arc GPU
performance issues and produced patches for llama.cpp:

Phase 1 - SYCL Sync: Enable graph execution by default (GGML_SYCL_DISABLE_GRAPH)
Phase 2 - SYCL Kernel: Fix VER_GEN12/13 thresholds, tune DMMV_X/MMV_Y
Phase 3 - Vulkan Intel: Arc 140T device-ID Xe2 override

Includes:
- Phased apply script (apply-phase.sh [1|2|3|all])
- Master apply script with --status/--reverse/--dry-run
- Per-phase READMEs with testing checklists
- Council deliberation logs (gitignored in logs/)

Verified: all patches apply/reverse cleanly via git apply.
Static verification: VER_GEN arithmetic and DMMV_X divisibility pass.

2026-04-15 14:53:40 +02:00

1.0 KiB

Raw Blame History

Phase 1 — SYCL Synchronization

0001-enable-sycl-graph-by-default.patch

Changes GGML_SYCL_DISABLE_GRAPH default from 1 (disabled) to 0 (enabled).

What it does

Enables SYCL graph execution for single-GPU dense LLM inference
Enables async memory operations (tied to graph support in upstream code)
Eliminates 8 blocking .wait() calls in reorder functions (Q4_0, Q8_0, Q4_K, Q6_K)

What it does NOT affect

MoE models (MUL_MAT_ID) — check_graph_compatibility() auto-disables graphs
CONCAT operations — auto-disabled
Multi-GPU setups — always disabled
Users can override: GGML_SYCL_DISABLE_GRAPH=1

Expected impact

10-30% token generation speedup on single-GPU dense LLM inference.

Testing checklist

Build succeeds with -DGGML_SYCL=ON
GGML_SYCL_DEBUG=1 shows "SYCL-GRAPH" messages for dense models
Dense model inference produces correct output
MoE model falls back gracefully (logs "disabling SYCL graphs")
GGML_SYCL_DISABLE_GRAPH=1 restores old behavior

1.0 KiB Raw Blame History

Phase 1 — SYCL Synchronization

0001-enable-sycl-graph-by-default.patch

What it does

What it does NOT affect

Expected impact

Testing checklist

1.0 KiB

Raw Blame History