[refactor] Pivot from MLX to LiteRT-LM backend #5

New issue

Closed

opened 2026-05-22 14:37:45 +02:00 by sleepy · 0 comments

sleepy commented

2026-05-22 14:37:45 +02:00

Owner

Pivot the inference backend from MLX Swift (gemma-4-swift-mlx) to Google LiteRT-LM.

Rationale

28% smaller model files (2.58 GB vs 3.6 GB for E2B)
5x less CPU RAM (607 MB vs 3.2 GB for E2B)
Faster GPU inference (56.5 tok/s vs 40 tok/s)
Built-in multimodal (vision + audio) with on-demand loading
Official Google Swift API with streaming, multi-turn, tool use
Native function calling with constrained decoding
Broader device support (A16+ iPads, not just M-series)
Speculative decoding support (91.7 tok/s on GPU)

Changes

Replace gemma-4-swift-mlx dependency with LiteRT-LM Swift Package
Replace InferenceService with LiteRT Engine/Conversation API
Replace ModelManager with .litertlm file management
Update models: litert-community/gemma-4-E2B-it-litert-lm, litert-community/gemma-4-E4B-it-litert-lm
Wire up vision input through LiteRT multimodal API
Wire up audio input through LiteRT multimodal API
Update project.yml for new dependency
Remove gemma-4-swift-mlx specific code

Pivot the inference backend from MLX Swift (gemma-4-swift-mlx) to Google LiteRT-LM. ## Rationale - 28% smaller model files (2.58 GB vs 3.6 GB for E2B) - 5x less CPU RAM (607 MB vs 3.2 GB for E2B) - Faster GPU inference (56.5 tok/s vs 40 tok/s) - Built-in multimodal (vision + audio) with on-demand loading - Official Google Swift API with streaming, multi-turn, tool use - Native function calling with constrained decoding - Broader device support (A16+ iPads, not just M-series) - Speculative decoding support (91.7 tok/s on GPU) ## Changes - Replace gemma-4-swift-mlx dependency with LiteRT-LM Swift Package - Replace InferenceService with LiteRT Engine/Conversation API - Replace ModelManager with .litertlm file management - Update models: litert-community/gemma-4-E2B-it-litert-lm, litert-community/gemma-4-E4B-it-litert-lm - Wire up vision input through LiteRT multimodal API - Wire up audio input through LiteRT multimodal API - Update project.yml for new dependency - Remove gemma-4-swift-mlx specific code

sleepy referenced this issue from a commit

2026-05-22 14:45:35 +02:00

refactor: pivot from MLX to LiteRT-LM backend

sleepy referenced this issue from a pull request that will close it,

2026-05-22 14:45:46 +02:00

[refactor] Pivot from MLX to LiteRT-LM backend (#5) #6

sleepy referenced this issue from a commit

2026-05-22 14:45:57 +02:00

[refactor] Pivot from MLX to LiteRT-LM backend (#5) (#6)

sleepy closed this issue

2026-05-22 14:45:58 +02:00