Profile concurrent encoding effectiveness #35
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
GGML_METAL_CONCURRENCY_DISABLE=1 drops tg128 from 53.83 to 51.80 -- only 4% improvement from concurrent command buffer encoding. Many ops are marked concurrent but may not benefit from pipelining.
Data (9B Q4_0)
Graph topology
Default n_cb=1 creates 2 command buffers: main thread handles first 183 nodes, async thread handles remaining 1650 nodes. With concurrency disabled, all nodes are encoded sequentially.
Questions