Files
llama.cpp/ggml/src
Jeff Bolz 61bde8e21f vulkan: Reduce temporary memory usage for TOP_K (#17623)
- Compute row size for the temp buffer based on the output of the first pass.
- Update shader addressing math to use the output row size
- Pass the output row size as "ncols_output", what used to be "ncols_output" is now "k"

For the common case of K=40 and src0=(200000,1,1,1), this reduces the temporary buffer
from about 3.2MB to 500KB.
2025-12-02 19:22:04 +01:00
..
2025-09-29 17:43:58 +03:00
2025-11-30 21:57:31 +01:00
2025-12-02 08:56:46 +08:00
2025-08-05 22:10:36 +03:00
2025-08-05 22:10:36 +03:00
2025-11-30 21:57:31 +01:00
2025-09-05 11:34:28 +02:00