fadde67135
* Single load for half2 * Store scales in local mem * Vec load quantized values
* Single load for half2 * Store scales in local mem * Vec load quantized values