llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653)

* llama: automatically fit args to free memory

llama-fit-params tool

* fix CI

* hints for bug reports, ensure no reallocation

* fix segfault with Vulkan

* add llama-fit-params to CI

* fix CI

* fix CI

* fix CI

* minor adjustments

* fix assignment of 1 dense layer

* fix logger not being reset on model load failure

* remove --n-gpu-layer hint on model load failure

* fix llama-fit-params verbosity

* fix edge case

* fix typo [no ci]
This commit is contained in:
Johannes Gäßler
2025-12-15 09:24:59 +01:00
committed by GitHub
parent 4aced7a631
commit b1f3a6e5db
26 changed files with 1075 additions and 63 deletions
+1
View File
@@ -37,4 +37,5 @@ else()
add_subdirectory(cvector-generator)
add_subdirectory(export-lora)
endif()
add_subdirectory(fit-params)
endif()