Files
llama.cpp/tools/mtmd/tests/test-1-extracted.md
T
Saba Fallah a970515bdb mtmd: Add DeepSeekOCR Support (#17400)
* mtmd: llama.cpp DeepSeekOCR support
init commit

* loading sam tensors

* mtmd: fix vision model processing

* deepseek-ocr clip-vit model impl

* mtmd: add DeepSeek-OCR LM support with standard attention

* mtmd: successfully runs DeepSeek-OCR LM in llama-cli

* mtmd: Fix RoPE type for DeepSeek-OCR LM.

* loading LM
testing Vision model loading

* sam warmup working

* sam erroneous return corrected

* clip-vit:  corrected cls_embd concat

* clip-vit: model convert  qkv_proj split

* corrected combining of image encoders' results

* fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model

* concat image_newline and image_seperator tokens

* visual_model warmup (technically) works

* window partitioning using standard ggml ops

* sam implementation without using CPU only ops

* clip: fixed warnings

* Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr

* mtmd: fix get_rel_pos

* mtmd: fixed the wrong scaler for get_rel_pos

* image encoding technically works but the output can't be checked singe image decoding fails

* mtmd: minor changed

* mtmd: add native resolution support

* - image encoding debugged
- issues fixed mainly related wrong config like n_patches etc.
- configs need to be corrected in the converter

* mtmd: correct token order

* - dynamic resizing
- changes are concerning PR https://github.com/sfallah/llama.cpp/pull/4

* mtmd: quick fix token order

* mtmd: fix danling pointer

* mtmd: SAM numerically works

* mtmd: debug CLIP-L (vit_pre_ln)

* mtmd: debug CLIP-L & first working DeepSeek-OCR model

* mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work

* mtmd: simplify SAM patch embedding

* mtmd: adapt Pillow image resizing function

* mtmd:  simplify DeepSeek-OCR dynamic resolution preprocessing

* mtmd: remove --dsocr-mode argument

* mtmd: refactor code & remove unused helper functions

* mtmd: fix tensor names for image newlines and view separator

* clean up

* reverting automatically removed spaces

* reverting automatically removed spaces

* mtmd: fixed bad ocr check in Deepseek2 (LM)

* mtmd: support combined QKV projection in buid_vit

* using common build_attn in sam

* corrected code-branch when flash-attn disabled
enabling usage of --flash-attn option

* mtmd: minor fix

* minor formatting and style

* fixed flake8 lint issues

* minor editorconfig-check fixes

* minor editorconfig-check fixes

* mtmd: simplify get_rel_pos

* mtmd: make sam hparams configurable

* mtmd: add detailed comments for resize_bicubic_pillow

* mtmd: fixed wrong input setting

* mtmd: convert model in FP16

* mtmd: minor fix

* mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template

* fix: test-1.jpg ORC issue with small (640) resolution
setting min-resolution base (1024) max large (1280) for dynamic-resolution

* minor: editconfig-check fix

* merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909
added new opt to tests.sh to disable flash-attn

* minor: editconfig-check fix

* testing deepseek-ocr
quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR

* quick and (potential) dirty merge with https://github.com/ggml-org/llama.cpp/pull/17909

* refactoring, one single builder function and static helpers

* added deepseek-ocr test to tests.sh

* minor formatting fixes

* check with fixed expected resutls

* minor formatting

* editorconfig-check fix

* merge with changes from https://github.com/ggml-org/llama.cpp/pull/18042

* minor
- added GLM-4.6V to big tests
- added missing deps for python test

* convert: minor fix

* mtmd: format code

* convert: quick fix

* convert: quick fix

* minor python formatting

* fixed merge build issue

* merge resolved
- fixed issues in convert
- tested several deepseek models

* minor fix

* minor

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* - removed clip_is_deepseekocr
- removed redundant RESIZE_ALGO_BICUBIC_PILLOW resize-algo
- simplified image-preprocessing
- removed/simplified debug functions

* - cleaning commented out code

* fixing instabilities issues reintroducing resize_bicubic_pillow

* - use f16 model for deepseek-ocr test
- ignore llama-arch test for deepseek-ocr

* rename fc_w --> mm_fc_w

* add links to OCR discussion

* cleaner loading code

* add missing .weight to some tensors

* add default jinja template (to be used by server)

* move test model to ggml-org

* rolling back upscale change

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: bluebread <hotbread70127@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-03-25 19:57:40 +01:00

3.6 KiB

<|ref|>title<|/ref|><|det|>61, 255, 907, 533<|/det|>

MEN WALK ON MOON

ASTRONAUTS LAND ON PLAIN; COLLECT ROCKS, PLANT FLAG

<|ref|>text<|/ref|><|det|>56, 559, 268, 629<|/det|> Voice From Moon: Eagle Has Landed'

<|ref|>text<|/ref|><|det|>74, 645, 262, 675<|/det|> EAGLE (the lunar surface, Houston, Truesquily) Base here, The Eagle has landed.

<|ref|>text<|/ref|><|det|>74, 675, 262, 720<|/det|> BOOTHROOM: Lounge, Truesquily, we enjoy you on the ground. You've got a bunch of guys about to toss bikes. We're breaking again. Thanks a lot.

<|ref|>text<|/ref|><|det|>74, 720, 262, 750<|/det|> TRAVELLING MADE: Time you. BOOTHROOM: You're looking good here.

<|ref|>text<|/ref|><|det|>74, 750, 262, 780<|/det|> TRAVELLING MADE: A very smooth touchdown. BEDROOM: Eagle, you are very far. I'll. (The first sign in the lunar appearance) (Over.)

<|ref|>text<|/ref|><|det|>74, 780, 262, 810<|/det|> TRAVELLING MADE: Eagle, stay for I'll. BOOTHROOM: Bumper and we are you waiting the cue.

<|ref|>text<|/ref|><|det|>74, 810, 262, 830<|/det|> TRAVELLING MADE: Eagle, and service mobility.

<|ref|>text<|/ref|><|det|>74, 830, 262, 850<|/det|> How do you read me?

<|ref|>text<|/ref|><|det|>74, 850, 262, 880<|/det|> TRAVELLING COLUMBIA, he has landed Truesquily. Base, Eagle is at Truesquily. I read you first by. Over.

<|ref|>text<|/ref|><|det|>74, 880, 262, 900<|/det|> COLUMBIA: Yes, I heard the whole thing.

<|ref|>text<|/ref|><|det|>74, 900, 262, 920<|/det|> BOOTHROOM: Well, it's a good show.

<|ref|>text<|/ref|><|det|>74, 920, 262, 940<|/det|> COLUMBIA: Fantastic.

<|ref|>text<|/ref|><|det|>74, 940, 262, 960<|/det|> TRAVELLING MADE: I'll read that.

<|ref|>text<|/ref|><|det|>74, 960, 262, 980<|/det|> APOLLO CONTROL: The most major sky to sky will be for the 23 event, that is at 21 minutes 26 sec-

<|ref|>text<|/ref|><|det|>74, 980, 262, 990<|/det|> tion of lunar descent.

<|ref|>image<|/ref|><|det|>270, 545, 697, 990<|/det|>

<|ref|>text<|/ref|><|det|>715, 559, 911, 629<|/det|> A Powdery Surface Is Closely Explored

<|ref|>text<|/ref|><|det|>733, 645, 851, 665<|/det|> BY JOHN NOBLE WILFORD

<|ref|>text<|/ref|><|det|>715, 669, 911, 700<|/det|> HOUSTON, Monday, July 21—New hires landed and walked on the moon.

<|ref|>text<|/ref|><|det|>715, 700, 911, 750<|/det|> Two Americans, astronauts of Apollo 11, steered their Eagle-shaped lunar module safely and smoothly to the lunar landing yesterday at 4:17:40 P.M., Eastern day-light time.

<|ref|>text<|/ref|><|det|>715, 750, 911, 780<|/det|> Neil A. Armstrong, the 38-year-old civilian commander, radioed to earth and the landing team here.

<|ref|>text<|/ref|><|det|>715, 780, 911, 830<|/det|> "Boom, Truesquily! Base here. The Eagle has landed," the first man to reach the moon—Neil Armstrong and his engineer, Capt. Charles E. Alder, of the Jet Propulsion Laboratory, the space agency's rocket and space program manager.

<|ref|>text<|/ref|><|det|>715, 830, 911, 880<|/det|> About six and a half hours later, Mr. Armstrong opened the landing craft's hatch, stepped slowly down the ladder and descended as he pointed his first landing footguard on the lunar crater.

<|ref|>text<|/ref|><|det|>715, 880, 911, 920<|/det|> "That's one small step for man, one giant leap for mankind."

<|ref|>text<|/ref|><|det|>715, 920, 911, 960<|/det|> His first step on the moon came on 10:56:29 P.M., as a television camera recorded the craft's transmitted his every word to an aerial and excited audiences of hundreds of millions of people on earth.

<|ref|>text<|/ref|><|det|>749, 960, 861, 974<|/det|> Testable Slope Test Soil