No description
Find a file
Repository files (latest commit first)
Filename Latest commit message Latest commit date
2026-05-23 21:34:17 +02:00
README.md Add README.md 2026-05-23 21:34:17 +02:00

Kaloyan Nikolov

Software engineer working on LLM inference and training. M.Sc. Computer Science @ RWTH Aachen.


Current focus

  • Multi-Token Prediction (MTP) and speculative decoding for local inference
  • KV cache quantization and Metal GPU kernel optimization
  • Diffusion-based training for hybrid attention+linear architectures research
  • Ternary weight quantization research

Active projects

omlx — personal fork with MTP decoding and Q4 KV cache with Hadamard rotation

sleepy-llm — Zig-native LLM inference engine with hand-tuned Metal kernels

sleepy-agent — fully local Android AI assistant, on-device Gemma 4 inference

qwen_orthrus — exploring Orthrus diffusion for ternary-weight and hybrid LLM architectures


Background

  • Full port of a C++ speech recognition toolkit to Android using NDK.
  • Work on a custom MHA PyTorch module
  • Training of Attention and CTC multilingual ASR models with code-switching finetuning.

Stack

Python · Zig · C/C++ · TypeScript · Kotlin