
By
Mirai team
The Fastest inference engine for Apple
We are adding support for the latest and smallest language model from our partners from Liquid AI: LFM-2.5 350M. It is a tiny model that punches significantly above its weight class, outperforming much larger Qwen3.5-0.8B on reasoning and agentic tool use benchmarks.
The eval numbers are impressive:
GPQA Diamond: | 30.71% |
IFBench: | 40.67% |
BFCL v3: | 43.98% |
BFCL v4: | 21.98% |
Tau2-Telecom: | 18.42% |
The benchmarks
We are rolling out bfloat16 support initially, with our own 4-bit and 8-bit quantized checkpoints coming soon. Even when running in full precision, the model achieves the throughput of over 70 t/s on an iPhone, which meets and surpasses the interactivity needs of most applications.
Device | Prompt Tokens | Used Memory, GB | Time to 1st t., S | Prompt, T/S | Generate, T/S |
|---|---|---|---|---|---|
Apple A18 Pro | 854 | 0.945 ± 0.000 | 0.439 ± 0.023 | 1952.753 ± 110.379 | 72.927 ± 0.202 |
Apple M1 | 854 | 0.940 ± 0.000 | 0.318 ± 0.009 | 2687.680 ± 71.626 | 81.988 ± 0.196 |
Apple M1 Max | 854 | 0.985 ± 0.001 | 0.084 ± 0.015 | 10391.442 ± 1167.325 | 328.543 ± 0.766 |
Apple M1 Ultra | 854 | 1.004 ± 0.001 | 0.055 ± 0.032 | 17401.436 ± 3429.928 | 412.449 ± 2.484 |
Apple M2 | 854 | 0.943 ± 0.001 | 0.227 ± 0.007 | 3774.996 ± 101.814 | 117.432 ± 0.143 |
Apple M2 Pro | 854 | 0.955 ± 0.001 | 0.122 ± 0.007 | 7015.664 ± 329.433 | 209.566 ± 0.897 |
Apple M4 | 854 | 0.946 ± 0.001 | 0.155 ± 0.004 | 5502.026 ± 132.498 | 135.929 ± 0.226 |
Apple M4 Pro | 854 | 0.969 ± 0.001 | 0.084 ± 0.007 | 10178.273 ± 682.804 | 291.064 ± 1.677 |
Apple M5 | 854 | 0.950 ± 0.001 | 0.060 ± 0.003 | 14390.608 ± 634.761 | 152.178 ± 0.888 |
Apple M5 Max | 854 | 1.009 ± 0.001 | 0.020 ± 0.005 | 44882.626 ± 6073.676 | 564.260 ± 1.638 |
The model is available as LiquidAI/LFM2.5-350M
See all models optimized for on-device at trymirai.com/local-models