sampling : remove sampling branching in output_reserve (#18811)
* sampling : remove sampling branching in output_reserve This commit updates output_reserve in llama-context.cpp to always allocate sampling buffers regardless of whether sampling is needed for the current batch. The motivation for this is to avoid reallocations and branching based on the sampling requirements of the batch.
This commit is contained in:
+1
-1
@@ -212,7 +212,7 @@ private:
|
||||
|
||||
// Make sure enough space is available for outputs.
|
||||
// Returns max number of outputs for which space was reserved.
|
||||
uint32_t output_reserve(int32_t n_outputs, const llama_batch & batch);
|
||||
uint32_t output_reserve(int32_t n_outputs);
|
||||
|
||||
void output_reorder();
|
||||
|
||||
|
||||
Reference in New Issue
Block a user