examples : add debug utility/example (#18464)

* examples : add debug utility/example This commit introduces a new example named llama-debug which is a utility that is intended to be used to assist with developing/debugging a converted model. The motivation for this utilitiy is to assist in model conversion work to verify that the model produces the expected outputs. It is intended to replace logits.cpp in examples/model-conversion. Example usage: ```console ./build/bin/llama-debug \ -m models/Qwen2.5-0.5B-Instruct.gguf \ --prompt "Hello, my name is" \ --save-logits ... Model add_bos: false Input prompt: "Hello, my name is" Token ids (5): Hello(9707) ,(11) my(847) name(829) is(374) Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.bin Data saved to data/llamacpp-Qwen2.5-0.5B-Instruct.txt Prompt saved to data/llamacpp-Qwen2.5-0.5B-Instruct-prompt.txt Tokens saved to data/llamacpp-Qwen2.5-0.5B-Instruct-tokens.bin ``` For more details about the options available for this example, please refer to examples/debug/README.md. * throw runtime error instead of logging error * remove params.warmup and enable the warmup/nowarmup option * model-conversion : remove logits.cpp This commit removes logits.cpp in favor of using llama-debug for generating logits and embeddings. * examples : remove model-conversion directory This was missed in the previous commit. * model-conversion : add support for saving prompt and token ids This commit add support for storing the prompt and the token ids for the prompt when running the original models. The motivation for this is that this will allow us to compare the prompt and the tokens generated for the prompt when verifing the converted model. Currently it is possible that even if the same prompt is used that the tokens generated are different if there is a difference in the tokenization between the original and converted model which would currently go unnoticed (the verification will most likely fail but it might not be obvious why). * squash! model-conversion : add support for saving prompt and token ids fix pyright errors. * model-conversion : add compare_tokens utility This commit adds a script to compare token outputs between original and converted models. Example usage: ```console (venv) $ ./scripts/utils/compare_tokens.py pytorch-gemma-3-270m-it llamacpp-gemma-3-270m-it-bf16 Comparing tokens between: Original : pytorch-gemma-3-270m-it (6 tokens) Converted: llamacpp-gemma-3-270m-it-bf16 (6 tokens) ✅ All 6 tokens match! ``` And there is a verbose flag that will also print out the prompts: ```console (venv) $ ./scripts/utils/compare_tokens.py pytorch-gemma-3-270m-it llamacpp-gemma-3-270m-it-bf16 -v Original model prompt (pytorch-gemma-3-270m-it): prompt: Hello, my name is n_tokens: 6 token ids: 2, 9259, 236764, 1041, 1463, 563 Converted model prompt (llamacpp-gemma-3-270m-it-bf16): prompt: Hello, my name is n_tokens: 6 token ids: 2, 9259, 236764, 1041, 1463, 563 Comparing tokens between: Original : pytorch-gemma-3-270m-it (6 tokens) Converted: llamacpp-gemma-3-270m-it-bf16 (6 tokens) ✅ All 6 tokens match! ``` * model-conversion : add token comparison to verifiction scripts This commit add the calling of the compare_tokens function in compare-logits.py and semantic_check.py to ensure that the token ids that the tokenizers procoduce are the same before proceeding with verifying the logits/embeddings. Placing them in the existing scripts instead calling them separately ensures that the token comparison is always done prior to the logit/embedding verifications. Follow up commit/pr could refactor the causal logits verification into a single script instead of the two that exist now. This would reduce the code and make it consistent with the embeddings verficiation which only has a single script. * debug : use llama_model_n_embd_out This commit updates the debug example to use the new function llama_model_n_embd_out instead of llama_model_n_embd. The motivation for this change is to support late interation retriever models, like LFM2-ColBert-350M, where the output embeddings are down projected to a lower dimension. * debug : add print_usage function This commit adds a print_usage function that is passed to the common_params_parse. The motivation for this is that this enables a specific usage message which will be printed after all the options, for example: ```console example usage: Print tensors: ./build/bin/llama-debug -m model.gguf -p "Hello my name is" --verbose The tensors to be printed can be filtered with --tensor-filter option. Save logits/embeddings: ./build/bin/llama-debug -m model.gguf -p "Hello my name is" --save-logits Add --embedding to save embeddings ```
2026-01-07 10:42:19 +01:00
parent 3333951d86
commit ffba4f29e6
17 changed files with 725 additions and 319 deletions
@@ -6,7 +6,7 @@ from pathlib import Path

 # Add utils directory to path for direct script execution
 sys.path.insert(0, str(Path(__file__).parent.parent / "utils"))
-from common import get_model_name_from_env_path  # type: ignore[import-not-found]
+from common import get_model_name_from_env_path, compare_tokens  # type: ignore[import-not-found]

 def quick_logits_check(pytorch_file, llamacpp_file):
    """Lightweight sanity check before NMSE"""
@@ -58,6 +58,13 @@ def main():

    print("Checked all required files were found. Proceeding...\n")

+    # Verify tokens as they are a prerequisite for logits comparison.
+    print("🔍 Token Comparison Check")
+    print("=" * 40)
+    if not compare_tokens(f"pytorch-{model_name}", f"llamacpp-{llamacpp_model_name}"):
+        print("\n❌ Token mismatch detected")
+        sys.exit(1)
+    print()

    print("🔍 GGML Model Validation for model ", model_name)
    print("=" * 40)
@@ -67,7 +67,7 @@ with torch.no_grad():
    last_hidden_states = outputs.hidden_states[-1]

    # Get embeddings for all tokens
-    token_embeddings = last_hidden_states[0].cpu().numpy()  # Remove batch dimension
+    token_embeddings = last_hidden_states[0].float().cpu().numpy()  # Remove batch dimension

    print(f"Hidden states shape: {last_hidden_states.shape}")
    print(f"Token embeddings shape: {token_embeddings.shape}")
@@ -13,6 +13,6 @@ if [ -z "$CONVERTED_MODEL" ]; then
    exit 1
 fi

-cmake --build ../../build --target llama-logits -j8
+cmake --build ../../build --target llama-debug -j8

-../../build/bin/llama-logits -m $CONVERTED_MODEL -embd-mode "Hello world today"
+../../build/bin/llama-debug -m $CONVERTED_MODEL --embedding -p "Hello world today" --save-logits
@@ -21,6 +21,6 @@ fi
 echo $CONVERTED_MODEL
 echo $MODEL_TESTING_PROMPT

-cmake --build ../../build --target llama-logits -j8
+cmake --build ../../build --target llama-debug -j8

-../../build/bin/llama-logits -m "$CONVERTED_MODEL" "$MODEL_TESTING_PROMPT"
+../../build/bin/llama-debug -m "$CONVERTED_MODEL" -p "$MODEL_TESTING_PROMPT" --save-logits
@@ -7,12 +7,11 @@ import importlib
 import torch
 import numpy as np

-from pathlib import Path
 from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForImageTextToText, AutoConfig

 # Add parent directory to path for imports
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))
-from utils.common import debug_hook
+from utils.common import debug_hook, save_output_data

 def parse_arguments():
    parser = argparse.ArgumentParser(description="Process model with specified path")
@@ -126,6 +125,7 @@ def main():
    device = next(model.parameters()).device
    prompt = get_prompt(args)
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
+    token_ids = input_ids[0].cpu().tolist()

    print(f"Input tokens: {input_ids}")
    print(f"Input text: {repr(prompt)}")
@@ -151,19 +151,6 @@ def main():
        print(f"Last token logits shape: {last_logits.shape}")
        print(f"Vocab size: {len(last_logits)}")

-        data_dir = Path("data")
-        data_dir.mkdir(exist_ok=True)
-        bin_filename = data_dir / f"pytorch-{model_name}.bin"
-        txt_filename = data_dir / f"pytorch-{model_name}.txt"
-
-        # Save to file for comparison
-        last_logits.astype(np.float32).tofile(bin_filename)
-
-        # Also save as text file for easy inspection
-        with open(txt_filename, "w") as f:
-            for i, logit in enumerate(last_logits):
-                f.write(f"{i}: {logit:.6f}\n")
-
        # Print some sample logits for quick verification
        print(f"First 10 logits: {last_logits[:10]}")
        print(f"Last 10 logits: {last_logits[-10:]}")
@@ -175,8 +162,7 @@ def main():
            token = tokenizer.decode([idx])
            print(f"  Token {idx} ({repr(token)}): {last_logits[idx]:.6f}")

-        print(f"Saved bin logits to: {bin_filename}")
-        print(f"Saved txt logist to: {txt_filename}")
+        save_output_data(last_logits, token_ids, prompt, model_name)

 if __name__ == "__main__":
    main()