ggml-webgpu: compute pass batching and removing profiling overhead (#21873)

* Update register tiling matmul to use f32 accumulation

* fix profiling code

* Fix register tiling matmul for chrome, i'm blaming dawn

* Update batch tuning value for iOS

* compile fix

* Fix use of new load function

* Move to a single query set for GPU profiling

* Move to batching compute passes when not profiling

* Refactor build_multi

* remove iOS throttling now that we're batching compute passes
This commit is contained in:
Reese Levine
2026-04-16 01:12:19 -07:00
committed by GitHub
parent 8612ed18b7
commit 82677a6ede
File diff suppressed because it is too large Load Diff