a970515bdb
* mtmd: llama.cpp DeepSeekOCR support init commit * loading sam tensors * mtmd: fix vision model processing * deepseek-ocr clip-vit model impl * mtmd: add DeepSeek-OCR LM support with standard attention * mtmd: successfully runs DeepSeek-OCR LM in llama-cli * mtmd: Fix RoPE type for DeepSeek-OCR LM. * loading LM testing Vision model loading * sam warmup working * sam erroneous return corrected * clip-vit: corrected cls_embd concat * clip-vit: model convert qkv_proj split * corrected combining of image encoders' results * fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model * concat image_newline and image_seperator tokens * visual_model warmup (technically) works * window partitioning using standard ggml ops * sam implementation without using CPU only ops * clip: fixed warnings * Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr * mtmd: fix get_rel_pos * mtmd: fixed the wrong scaler for get_rel_pos * image encoding technically works but the output can't be checked singe image decoding fails * mtmd: minor changed * mtmd: add native resolution support * - image encoding debugged - issues fixed mainly related wrong config like n_patches etc. - configs need to be corrected in the converter * mtmd: correct token order * - dynamic resizing - changes are concerning PR https://github.com/sfallah/llama.cpp/pull/4 * mtmd: quick fix token order * mtmd: fix danling pointer * mtmd: SAM numerically works * mtmd: debug CLIP-L (vit_pre_ln) * mtmd: debug CLIP-L & first working DeepSeek-OCR model * mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work * mtmd: simplify SAM patch embedding * mtmd: adapt Pillow image resizing function * mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing * mtmd: remove --dsocr-mode argument * mtmd: refactor code & remove unused helper functions * mtmd: fix tensor names for image newlines and view separator * clean up * reverting automatically removed spaces * reverting automatically removed spaces * mtmd: fixed bad ocr check in Deepseek2 (LM) * mtmd: support combined QKV projection in buid_vit * using common build_attn in sam * corrected code-branch when flash-attn disabled enabling usage of --flash-attn option * mtmd: minor fix * minor formatting and style * fixed flake8 lint issues * minor editorconfig-check fixes * minor editorconfig-check fixes * mtmd: simplify get_rel_pos * mtmd: make sam hparams configurable * mtmd: add detailed comments for resize_bicubic_pillow * mtmd: fixed wrong input setting * mtmd: convert model in FP16 * mtmd: minor fix * mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template * fix: test-1.jpg ORC issue with small (640) resolution setting min-resolution base (1024) max large (1280) for dynamic-resolution * minor: editconfig-check fix * merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909 added new opt to tests.sh to disable flash-attn * minor: editconfig-check fix * testing deepseek-ocr quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR * quick and (potential) dirty merge with https://github.com/ggml-org/llama.cpp/pull/17909 * refactoring, one single builder function and static helpers * added deepseek-ocr test to tests.sh * minor formatting fixes * check with fixed expected resutls * minor formatting * editorconfig-check fix * merge with changes from https://github.com/ggml-org/llama.cpp/pull/18042 * minor - added GLM-4.6V to big tests - added missing deps for python test * convert: minor fix * mtmd: format code * convert: quick fix * convert: quick fix * minor python formatting * fixed merge build issue * merge resolved - fixed issues in convert - tested several deepseek models * minor fix * minor * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * - removed clip_is_deepseekocr - removed redundant RESIZE_ALGO_BICUBIC_PILLOW resize-algo - simplified image-preprocessing - removed/simplified debug functions * - cleaning commented out code * fixing instabilities issues reintroducing resize_bicubic_pillow * - use f16 model for deepseek-ocr test - ignore llama-arch test for deepseek-ocr * rename fc_w --> mm_fc_w * add links to OCR discussion * cleaner loading code * add missing .weight to some tensors * add default jinja template (to be used by server) * move test model to ggml-org * rolling back upscale change * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: bluebread <hotbread70127@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
110 lines
3.3 KiB
CMake
110 lines
3.3 KiB
CMake
# mtmd
|
|
|
|
find_package(Threads REQUIRED)
|
|
|
|
add_library(mtmd
|
|
mtmd.cpp
|
|
mtmd-audio.cpp
|
|
mtmd.h
|
|
mtmd-helper.cpp
|
|
mtmd-helper.h
|
|
clip.cpp
|
|
clip.h
|
|
clip-impl.h
|
|
clip-model.h
|
|
clip-graph.h
|
|
models/models.h
|
|
models/cogvlm.cpp
|
|
models/conformer.cpp
|
|
models/glm4v.cpp
|
|
models/internvl.cpp
|
|
models/kimivl.cpp
|
|
models/kimik25.cpp
|
|
models/nemotron-v2-vl.cpp
|
|
models/llama4.cpp
|
|
models/llava.cpp
|
|
models/minicpmv.cpp
|
|
models/paddleocr.cpp
|
|
models/pixtral.cpp
|
|
models/qwen2vl.cpp
|
|
models/qwen3vl.cpp
|
|
models/siglip.cpp
|
|
models/whisper-enc.cpp
|
|
models/deepseekocr.cpp
|
|
models/mobilenetv5.cpp
|
|
models/youtuvl.cpp
|
|
)
|
|
|
|
set_target_properties(mtmd PROPERTIES
|
|
VERSION ${LLAMA_INSTALL_VERSION}
|
|
SOVERSION 0
|
|
MACHO_CURRENT_VERSION 0 # keep macOS linker from seeing oversized version number
|
|
)
|
|
|
|
target_link_libraries (mtmd PUBLIC ggml llama)
|
|
target_link_libraries (mtmd PRIVATE Threads::Threads)
|
|
target_include_directories(mtmd PUBLIC .)
|
|
target_include_directories(mtmd PRIVATE ../..)
|
|
target_include_directories(mtmd PRIVATE ../../vendor)
|
|
target_compile_features (mtmd PRIVATE cxx_std_17)
|
|
|
|
if (BUILD_SHARED_LIBS)
|
|
set_target_properties (mtmd PROPERTIES POSITION_INDEPENDENT_CODE ON)
|
|
target_compile_definitions(mtmd PRIVATE LLAMA_BUILD)
|
|
target_compile_definitions(mtmd PUBLIC LLAMA_SHARED)
|
|
endif()
|
|
|
|
set(MTMD_PUBLIC_HEADERS
|
|
${CMAKE_CURRENT_SOURCE_DIR}/mtmd.h
|
|
${CMAKE_CURRENT_SOURCE_DIR}/mtmd-helper.h
|
|
)
|
|
|
|
set_target_properties(mtmd
|
|
PROPERTIES
|
|
PUBLIC_HEADER "${MTMD_PUBLIC_HEADERS}")
|
|
|
|
set_target_properties(mtmd
|
|
PROPERTIES
|
|
PRIVATE_HEADER debug/mtmd-debug.h)
|
|
|
|
install(TARGETS mtmd LIBRARY PUBLIC_HEADER)
|
|
|
|
if (NOT MSVC)
|
|
# for stb_image.h and miniaudio.h
|
|
target_compile_options(mtmd PRIVATE -Wno-cast-qual)
|
|
endif()
|
|
|
|
if (TARGET BUILD_INFO)
|
|
add_dependencies(mtmd BUILD_INFO)
|
|
add_dependencies(mtmd-helper BUILD_INFO)
|
|
endif()
|
|
|
|
# if mtmd is linked against common, we throw an error
|
|
if (TARGET mtmd)
|
|
get_target_property(libs mtmd LINK_LIBRARIES)
|
|
if (libs AND "common" IN_LIST libs)
|
|
message(FATAL_ERROR "mtmd is designed to be a public library.\n"
|
|
"It must not link against common")
|
|
endif()
|
|
endif()
|
|
|
|
add_executable(llama-llava-cli deprecation-warning.cpp)
|
|
add_executable(llama-gemma3-cli deprecation-warning.cpp)
|
|
add_executable(llama-minicpmv-cli deprecation-warning.cpp)
|
|
add_executable(llama-qwen2vl-cli deprecation-warning.cpp)
|
|
|
|
set(TARGET llama-mtmd-cli)
|
|
add_executable (${TARGET} mtmd-cli.cpp)
|
|
set_target_properties (${TARGET} PROPERTIES OUTPUT_NAME llama-mtmd-cli)
|
|
if(LLAMA_TOOLS_INSTALL)
|
|
install(TARGETS ${TARGET} RUNTIME)
|
|
endif()
|
|
target_link_libraries (${TARGET} PRIVATE common mtmd Threads::Threads)
|
|
target_compile_features(${TARGET} PRIVATE cxx_std_17)
|
|
|
|
# mtmd-debug tool
|
|
add_executable(llama-mtmd-debug debug/mtmd-debug.cpp)
|
|
set_target_properties(llama-mtmd-debug PROPERTIES OUTPUT_NAME llama-mtmd-debug)
|
|
target_link_libraries(llama-mtmd-debug PRIVATE common mtmd Threads::Threads)
|
|
target_compile_features(llama-mtmd-debug PRIVATE cxx_std_17)
|