Files
llama.cpp/src
Tom Hillbrunner 212f4521b0 context : use n_embd_out for pooled embedding extraction (#20840)
The MEAN/CLS/LAST pooling paths in encode() and decode() used
n_embd_inp() (16384 for qwen3vl with deepstack) to read from the
pooled embedding tensor, which only has n_embd_out() (4096) floats
per sequence. This caused a tensor read out of bounds assertion.

Fixes embedding mode for Qwen3-VL-Embedding models.
2026-03-21 19:35:00 +02:00
..
2026-02-02 08:38:55 +02:00
2026-01-13 23:28:38 +01:00
2026-03-08 12:30:21 +01:00