3d9ab225e7
* OpenCL: add CUMSUM op support * remove unused argument * opencl: refactor cumsum * opencl: refactor * opencl: refactor tmp buffer * opencl: adjust max number of subgroups * opencl: fix whitespace * opencl: fix global size when cumsum the tmp buffer --------- Co-authored-by: Li He <lih@qti.qualcomm.com>