* Thread safety per request only
* Fix ROPE yarn case
* Fix sticky stateful config
* Use i4/i8 directly for symmetric quant
* Use weightless caching
* Add WeightlessCacheAttribute to reduce NPU memory usage
* Gelu tanh support (#125)
* Imrope support (#126)
* fix(openvino): explicit ov::Tensor frees in ggml_backend_openvino_free
* add GPU,NPU support in OV Dockerfile
* add build-openvino.yml ci
* Fix sticky stateful config
* add concurrency to ov-gpu ci runs. Move OV CI to build-openvino.yml
* fix thread-safety of shared runtime context
* rope type abstraction for frontend translations
* fix editorconfig
---------
Co-authored-by: Mustafa Cavus <mustafa.cavus@intel.com>
Co-authored-by: Dan Hoffman <dhoff749@gmail.com>
Co-authored-by: Ravi Panchumarthy <ravi.panchumarthy@intel.com>