[refactor] Pivot from MLX to LiteRT-LM backend #5

Closed
opened 2026-05-22 14:37:45 +02:00 by sleepy · 0 comments
Owner

Pivot the inference backend from MLX Swift (gemma-4-swift-mlx) to Google LiteRT-LM.

Rationale

  • 28% smaller model files (2.58 GB vs 3.6 GB for E2B)
  • 5x less CPU RAM (607 MB vs 3.2 GB for E2B)
  • Faster GPU inference (56.5 tok/s vs 40 tok/s)
  • Built-in multimodal (vision + audio) with on-demand loading
  • Official Google Swift API with streaming, multi-turn, tool use
  • Native function calling with constrained decoding
  • Broader device support (A16+ iPads, not just M-series)
  • Speculative decoding support (91.7 tok/s on GPU)

Changes

  • Replace gemma-4-swift-mlx dependency with LiteRT-LM Swift Package
  • Replace InferenceService with LiteRT Engine/Conversation API
  • Replace ModelManager with .litertlm file management
  • Update models: litert-community/gemma-4-E2B-it-litert-lm, litert-community/gemma-4-E4B-it-litert-lm
  • Wire up vision input through LiteRT multimodal API
  • Wire up audio input through LiteRT multimodal API
  • Update project.yml for new dependency
  • Remove gemma-4-swift-mlx specific code
Pivot the inference backend from MLX Swift (gemma-4-swift-mlx) to Google LiteRT-LM. ## Rationale - 28% smaller model files (2.58 GB vs 3.6 GB for E2B) - 5x less CPU RAM (607 MB vs 3.2 GB for E2B) - Faster GPU inference (56.5 tok/s vs 40 tok/s) - Built-in multimodal (vision + audio) with on-demand loading - Official Google Swift API with streaming, multi-turn, tool use - Native function calling with constrained decoding - Broader device support (A16+ iPads, not just M-series) - Speculative decoding support (91.7 tok/s on GPU) ## Changes - Replace gemma-4-swift-mlx dependency with LiteRT-LM Swift Package - Replace InferenceService with LiteRT Engine/Conversation API - Replace ModelManager with .litertlm file management - Update models: litert-community/gemma-4-E2B-it-litert-lm, litert-community/gemma-4-E4B-it-litert-lm - Wire up vision input through LiteRT multimodal API - Wire up audio input through LiteRT multimodal API - Update project.yml for new dependency - Remove gemma-4-swift-mlx specific code
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/Gemma4Pad#5
No description provided.