metal: SSM kernel improvements (#17876)
* feat: Add a batched version of ssm_conv This was done using Claude Code. It found a number of optimizations around how the threads were organized, resulting in a huge performance boost! Branch: Mamba2SSD Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Optimized SSM_SCAN kernel for metal This used Claude Code and resulted in a modest performance improvement while maintaining correctness. Branch: Mamba2SSD Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * test: Add test-backend-ops perf tests for SSM_CONV Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * test: Real representitive tests for SSM_CONV Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Use function constant for ssm_conv batch size Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * test: backend op tests for ssm_scan from granite4 1b-h Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * style: remove commented out templates Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: float4 version of ssm_conv_batched Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Add missing ggml_metal_cv_free Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
@@ -8193,6 +8193,13 @@ static std::vector<std::unique_ptr<test_case>> make_test_cases_perf() {
|
||||
}
|
||||
}
|
||||
|
||||
// Examples from granite-4.0-h-1b/ggml-model-Q8_0.gguf
|
||||
test_cases.emplace_back(new test_ssm_conv(GGML_TYPE_F32, {515, 3328, 1, 1}, {4, 3328, 1, 1})); // prefill
|
||||
test_cases.emplace_back(new test_ssm_conv(GGML_TYPE_F32, {4, 3328, 1, 1}, {4, 3328, 1, 1})); // generate
|
||||
test_cases.emplace_back(new test_ssm_scan(GGML_TYPE_F32, 128, 64, 48, 1, 512, 1)); // prefill
|
||||
test_cases.emplace_back(new test_ssm_scan(GGML_TYPE_F32, 128, 64, 48, 1, 1, 1)); // generate
|
||||
|
||||
|
||||
return test_cases;
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user