sampling : remove sampling branching in output_reserve (#18811)

* sampling : remove sampling branching in output_reserve

This commit updates output_reserve in llama-context.cpp to always
allocate sampling buffers regardless of whether sampling is needed for
the current batch.

The motivation for this is to avoid reallocations and branching based on
the sampling requirements of the batch.
This commit is contained in:
Daniel Bevenius
2026-01-28 05:59:30 +01:00
committed by GitHub
parent 06961e2876
commit eef375ce16
2 changed files with 33 additions and 57 deletions
+1 -1
View File
@@ -212,7 +212,7 @@ private:
// Make sure enough space is available for outputs.
// Returns max number of outputs for which space was reserved.
uint32_t output_reserve(int32_t n_outputs, const llama_batch & batch);
uint32_t output_reserve(int32_t n_outputs);
void output_reorder();