model : add tokenizer from LFM2.5-Audio-1.5B (#19687)

* model : Add tokenizer from LFM2.5-Audio-1.5B [LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B) introduced lightweight audio tokenizer. Tokenizer based on LFM2 architecture and acts as "embedding" model with different input `n_embd` and output `n_embd_out`. To be used in https://github.com/ggml-org/llama.cpp/pull/18641. To convert use ```shell python3 convert_hf_to_gguf.py /path/to/LFM2.5-Audio-1.5B/audio_detokenizer ``` * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Formatting * Rework check for attention layers * Add LFM2 SWA model support * Address PR feedback * Set vocab to none * Move helper function definitions to cpp file --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-19 09:54:48 +01:00
parent eacb4b67a2
commit 8004f3a8d1
7 changed files with 185 additions and 137 deletions
@@ -2417,8 +2417,9 @@ llm_graph_input_mem_hybrid_iswa * llm_graph_context::build_inp_mem_hybrid_iswa()

 void llm_graph_context::build_dense_out(
    ggml_tensor * dense_2,
+    ggml_tensor * dense_2_b,
    ggml_tensor * dense_3) const {
-    if (!cparams.embeddings || !(dense_2 || dense_3)) {
+    if (!cparams.embeddings || !(dense_2 || dense_2_b || dense_3)) {
        return;
    }
    ggml_tensor * cur = res->t_embd_pooled != nullptr ? res->t_embd_pooled : res->t_embd;
@@ -2427,6 +2428,9 @@ void llm_graph_context::build_dense_out(
    if (dense_2) {
        cur = ggml_mul_mat(ctx0, dense_2, cur);
    }
+    if (dense_2_b) {
+        cur = ggml_add(ctx0, cur, dense_2_b);
+    }
    if (dense_3) {
        cur = ggml_mul_mat(ctx0, dense_3, cur);
    }