[refactor] Pivot from MLX to LiteRT-LM backend (#5) #6

Merged

sleepy merged 1 commit from refactor/5-pivot-litert-lm into main

2026-05-22 14:45:57 +02:00

sleepy commented

2026-05-22 14:45:46 +02:00

Owner

Pivot inference backend from MLX Swift to Google LiteRT-LM.

Key improvements

E2B: 2.58 GB file (28% smaller), 607 MB RAM (5x less), 56.5 tok/s GPU
E4B: 3.65 GB file (27% smaller), 961 MB RAM (5x less)
Built-in multimodal vision + audio with on-demand loading
Native function calling for web search
Official Google Swift API
GPU-first with CPU fallback
Works on A16+ iPads (not just M-series)

Changes

Replace gemma-4-swift-mlx with LiteRT-LM Swift Package
Rewrite InferenceService with Engine/Conversation API
Rewrite ModelManager for single .litertlm files with real download progress
Remove Q4/Q8 selection (LiteRT models pre-quantized)
Add extended-virtual-addressing entitlement

Closes #5

Pivot inference backend from MLX Swift to Google LiteRT-LM. ## Key improvements - E2B: 2.58 GB file (28% smaller), 607 MB RAM (5x less), 56.5 tok/s GPU - E4B: 3.65 GB file (27% smaller), 961 MB RAM (5x less) - Built-in multimodal vision + audio with on-demand loading - Native function calling for web search - Official Google Swift API - GPU-first with CPU fallback - Works on A16+ iPads (not just M-series) ## Changes - Replace gemma-4-swift-mlx with LiteRT-LM Swift Package - Rewrite InferenceService with Engine/Conversation API - Rewrite ModelManager for single .litertlm files with real download progress - Remove Q4/Q8 selection (LiteRT models pre-quantized) - Add extended-virtual-addressing entitlement Closes #5

sleepy added 1 commit

2026-05-22 14:45:46 +02:00

refactor: pivot from MLX to LiteRT-LM backend e3fa04887d

- Replace gemma-4-swift-mlx with Google LiteRT-LM Swift Package
- E2B: 2.58 GB (was 3.6 GB), 607 MB RAM (was 3.2 GB), 56.5 tok/s GPU
- E4B: 3.65 GB (was 5.0 GB), 961 MB RAM (was 5.0 GB)
- Built-in multimodal vision + audio with on-demand encoder loading
- Native function calling with constrained decoding for web search
- Official Swift API with Engine/Conversation/streaming
- GPU-first with CPU fallback
- Real download progress with cancel support
- Single .litertlm model file per variant
- Added extended-virtual-addressing entitlement for large models
- Removed Q4/Q8 selection (LiteRT models pre-quantized)

Closes #5

sleepy merged commit 8b2293e664 into main

2026-05-22 14:45:57 +02:00

sleepy referenced this pull request from a commit

2026-05-22 14:45:57 +02:00

[refactor] Pivot from MLX to LiteRT-LM backend (#5) (#6)

No reviewers

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

sleepy/Gemma4Pad!6

No description provided.

Rows
Columns