[cleanup] Gate profiling prints behind debug flag (#44) #54
Loading…
Reference in a new issue
No description provided.
Delete branch "refactor/44-remove-profiling-prints"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Gates
decode_profileandargmax_compareprints (and their macOS-specificmach_absolute_time/mach_timebase_infocalls) behind@import("builtin").mode == .Debug.engine.zigTest Results
zig buildcompiles cleanlyBenchmarks
No performance impact expected (timing calls removed from release builds).
CHANGES_REQUESTED
Dead GPU work in release builds:
dispatch.set_argmax_bf16()is dispatched unconditionally, but the GPU argmax result is never read in non-debug builds. Move the dispatch inside theif (Debug)block to avoid wasting GPU cycles per token in release mode.Unnecessary code duplication: The
copy_bf16_buffer_to_f32+ CPU argmax logic is duplicated in both branches. Restructure so the shared readback and CPU argmax happen once after the conditional debug timing, rather than duplicating ~20 lines of identical logic.Suggested: gate only the
mach_absolute_timecalls andstd.debug.printlines behindDebug, keep the shared readback/argmax single-path.103f6d5c2atoe11677ecdc