Files
llama.cpp/ggml
nullname 8af1f5f430 ggml-hexagon: flash-attn opt (#19025)
* optimize flash attention kernel by improving score computation and online softmax update

* wip

* Refactor online softmax update in flash attention kernel for improved performance

* Optimize flash attention kernel by replacing float array with HVX_Vector for score computation

* wip
2026-01-23 22:02:07 -08:00
..
2026-01-23 22:02:07 -08:00
2024-07-13 18:12:39 +02:00