f3dd7b8e680e37342649024c0ee282c5d60b83da
3 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
cff777f226 |
hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822)
* hexagon: disable repack buffers if host buffers are disabled, improved handling of env vars * hexagon: add support for OP_CPY fp16/fp32 -> fp16/fp32 Factore out all hvx_copy functions into hvx-copy.h header and reduced code duplication. Update HTP ops infra to support OP_CPY * hexagon: cleanup and refactor hex/hvx/htp headers and helper libs hex is basically all scalar/core platform stuff (L2, DMA, basic utils) hvx is all hvx related utils, helpers, etc htp is higher level stuff like Ops, etc hvx-utils library got a nice round of cleanup and refactoring to reduce duplication use hvx_vec_store_a where possible * hexagon: refactor HVX sigmoid functions to hvx-sigmoid.h Moved sigmoid and tanh vector functions from hvx-utils.h to a new header hvx-sigmoid.h. Implemented aligned and unaligned variants for sigmoid array processing using a macro pattern similar to hvx-copy.h. Updated act-ops.c to use the new aligned variant hvx_sigmoid_f32_aa. Removed unused hvx-sigmoid.c. * hexagon: factor out hvx-sqrt.h * hexagon: mintor update to hvx-utils.h * hexagon: remove spurios log * hexagon: factor out and optimize hvx_add/sub/mul * hexagon: remove _opt variants of add/sub/mul as they simply fully aligned versions * hexagon: refactor reduction functions to hvx-reduce.h Moved `hvx_self_max_f32` and `hvx_self_sum_f32` from `hvx-utils.h`/`.c` to `hvx-reduce.h`. Renamed them to `hvx_reduce_max_f32` and `hvx_reduce_sum_f32`. Added aligned (`_a`) and unaligned (`_u`) variants and used macros to unify logic. Updated `softmax-ops.c` to use the new functions. * hexagon: refactor the rest of arithmetic functions to hvx-arith.h Moved `hvx_sum_of_squares_f32`, `hvx_min_scalar_f32`, and `hvx_clamp_scalar_f32` from `hvx-utils.c/h` to `hvx-arith.h`. Implemented aligned/unaligned variants (`_aa`, `_au`, etc.) and used macros to reduce code duplication. Updated `hvx_min_scalar_f32` and `hvx_clamp_scalar_f32` to use `dst, src, ..., n` argument order. Updated call sites in `act-ops.c`. Refactor Hexagon HVX arithmetic functions (min, clamp) to hvx-arith.h Moved `hvx_min_scalar_f32` and `hvx_clamp_scalar_f32` from `hvx-utils.c/h` to `hvx-arith.h`. Implemented aligned/unaligned variants (`_aa`, `_au`, etc.) and used macros to reduce code duplication. Updated these functions to use `dst, src, ..., n` argument order and updated call sites in `act-ops.c`. `hvx_sum_of_squares_f32` remains in `hvx-utils.c` as requested. * hexagon: refactor hvx_sum_of_squares_f32 - Modify `hvx_sum_of_squares_f32` in `ggml/src/ggml-hexagon/htp/hvx-reduce.h` to use `dst, src` signature. - Implement `_a` (aligned) and `_u` (unaligned) variants for `hvx_sum_of_squares_f32`. - Update `hvx_reduce_loop_body` macro to support both returning and storing results via `finalize_op`. - Update existing reduction functions in `hvx-reduce.h` to use the updated macro. - Update `rms_norm_htp_f32` in `ggml/src/ggml-hexagon/htp/unary-ops.c` to match the new signature. * hexagon: use hvx_splat instead of memset * hexagon: consistent use of f32/f16 in all function names to match the rest of GGML * hexagon: fix hvx_copy_f16_f32 on v75 and older * hexagon: update readme to include GGML_HEXAGON_EXPERIMENTAL * scripts: update snapdragon/adb scripts to enable host param |
||
|
|
ed75977717 |
ggml-hexagon: create generalized functions for cpu side op (#17500)
* refactor: replace ggml_hexagon_mul_mat with template-based binary operation for improved flexibility * refactor: replace ggml_hexagon_mul_mat_id with template-based binary operation for improved flexibility * refactor: initialize buffer types and streamline dspqueue_buffers_init calls for clarity * add comment * refactor: remove redundant buffer checks in hexagon supported operations * wip * add missing include to fix weak symbol warning * add ggml_hexagon_op_generic * refactor: simplify tensor operation initialization and buffer management in hexagon implementation * refactor: streamline hexagon operation initialization and buffer management * refactor: update function signatures and streamline request handling in hexagon operations * wip * ggml-hexagon: clean up code formatting and improve unary operation handling * wip * rename * fix: add support for permuted F16 tensors and enhance quantization checks in matrix operations * refactor: replace ggml_hexagon_mul_mat with template-based binary operation for improved flexibility refactor: replace ggml_hexagon_mul_mat_id with template-based binary operation for improved flexibility refactor: initialize buffer types and streamline dspqueue_buffers_init calls for clarity refactor: remove redundant buffer checks in hexagon supported operations add missing include to fix weak symbol warning add ggml_hexagon_op_generic refactor: simplify tensor operation initialization and buffer management in hexagon implementation refactor: streamline hexagon operation initialization and buffer management refactor: update function signatures and streamline request handling in hexagon operations ggml-hexagon: clean up code formatting and improve unary operation handling fix: add support for permuted F16 tensors and enhance quantization checks in matrix operations # Conflicts: # ggml/src/ggml-hexagon/ggml-hexagon.cpp * hexagon: fix merge conflicts * hexagon: minor cleanup for buffer support checks * hexagon: factor out op_desc and the overal op logging * hexagon: further simplify and cleanup op dispatch logic * snapdragon: update adb scripts to use llama-cli and llama-completion * fix pipeline failure --------- Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com> |
||
|
|
63d2fc46e1 |
Add experimental ggml-hexagon backend for the Hexagon NPU (#16547)
* model: add support for extra bufs for all devices * hexagon: add experimental ggml-hexagon backend for the Hexagon NPU This commit introduces a new experimental backend `ggml-hexagon` with support for the Hexagon NPU. Highlights: - Supports Hexagon versions: v73, v75, v79, and v81 - Targets Android devices based on Snapdragon SoCs: Gen3, 8-Elite, and 8-Elite Gen5 - Supports Q4_0, Q8_0, MXFP4, and FP32 data types - Implements core LLM ops: MUL_MAT/MUL_MAT_ID, ADD/SUB/MUL/ADD_ID, RMS_NORM, ROPE, GLU/SWIGLU, SOFTMAX **Note:** This backend is experimental and may exhibit instability or limited performance across supported devices. It is intended for early testing and feedback from llama.cpp/ggml developer and user community. Co-Authored-By: Rajdeep Ganguly <rganguly@qti.qualcomm.com> Co-Authored-By: Todor Boinovski <todorb@qti.qualcomm.com> * hexagon: fix format checker errors * hexagon: update readme and cmake presets * ci: add android-ndk-build jobs that build plain ARM64 and Snapdragon versions * hexagon: add simple graph optimizer for stacking MUL_MAT ops with the same input * hexagon: move ADB helper scripts into scripts/snapdragon/adb * hexagon: replace all f/printfs with GGML_LOG_... * readme: add hexagon to the list supported backends * hexagon: stack malmuts with quantized inputs only * hexagon: add TODO for fixing issues in hexagon_graph_optimize * hexagon: update to hex-sdk 6.4.0 and add scripts for running on QDC * scripts: fix lint errors * scripts: update qdc pytest script to make linter happy * hexagon: add reduce sum in fp32 * hexagon: reduce number of vector stores in matmul output * hexagon: remove the need for vdelta in reduce-multiply-x8 * hexagon: consistent use of reduce_sum_fp32 for row_sums * hexagon: some more matmul optimizations and comments Optimize cases where tensor dims are not multiple of 1024 (e.g in Qwen models). We've handled those cases already but at a higher overhead. * hexagon: update cmake presets * hexagon: add OPMASK support for run-bench.sh wrapper * hexagon: update to use GGML_BACKEND_API * hexagon: remove unused logic for setting tensor flags for the views * hexagon: add asserts to set/get_tensor to make sure we handle complete tensors Same asserts as the CPU backend. * hexagon: use cpy_tensor slow path for non-host buffers * hexagon: error checks in the buffer allocator * cmake: move include(extProj) under ggml-hexagon * hexagon: don't forget to delete the backend on free * hexagon: set/get_tensor size assert apply only to quantized tensors * hexagon: reintroduce HEX_VERBOSE wrapper for GGML_LOG_DEBUG for now GGML_LOG_DEBUG is always enabled for test-backend-ops and the output gets in the way. Ideally we need a bit more finer log levels. * docs: typos in hexagon developer docs (libggm-...) * hexagon: overhaul error handling in the session/device allocation this should handle all failure paths in the session allocation. * hexagon: update cmake presets to enable fp16 vectors * hexagon: remove unused time_usec function * hexagon: don't forget to release buffer contexts * hexagon: fixed indents in hvx-utils (missed clang-format auto-format failure) * hexagon: remove custom can_repeat function and use ggml_can_repeat --------- Co-authored-by: Rajdeep Ganguly <rganguly@qti.qualcomm.com> Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com> |