Files
llama.cpp/ggml
Aman Gupta b94050e896 CUDA: use LRU based eviction for cuda graphs (#21611)
* CUDA: use a ring-buffer for cuda graphs

* bump limit to 128

* use LRU eviction

* better naming

* do periodic clean-up
2026-04-17 23:24:21 +08:00
..
2024-07-13 18:12:39 +02:00