LFM2.5-350M on Apple Silicon

LFM2.5-350M on Apple Silicon

By

Mirai team

The Fastest inference engine for Apple

We are adding support for the latest and smallest language model from our partners from Liquid AI: LFM-2.5 350M. It is a tiny model that punches significantly above its weight class, outperforming much larger Qwen3.5-0.8B on reasoning and agentic tool use benchmarks.

The eval numbers are impressive:

GPQA Diamond:

30.71%

IFBench:

40.67%

BFCL v3:

43.98%

BFCL v4:

21.98%

Tau2-Telecom:

18.42%

The benchmarks

We are rolling out bfloat16 support initially, with our own 4-bit and 8-bit quantized checkpoints coming soon. Even when running in full precision, the model achieves the throughput of over 70 t/s on an iPhone, which meets and surpasses the interactivity needs of most applications.

Device
Prompt Tokens
Used Memory, GB
Time to 1st t., S
Prompt, T/S
Generate, T/S
Apple A18 Pro
854
0.945 ± 0.000
0.439 ± 0.023
1952.753 ± 110.379
72.927 ± 0.202
Apple M1
854
0.940 ± 0.000
0.318 ± 0.009
2687.680 ± 71.626
81.988 ± 0.196
Apple M1 Max
854
0.985 ± 0.001
0.084 ± 0.015
10391.442 ± 1167.325
328.543 ± 0.766
Apple M1 Ultra
854
1.004 ± 0.001
0.055 ± 0.032
17401.436 ± 3429.928
412.449 ± 2.484
Apple M2
854
0.943 ± 0.001
0.227 ± 0.007
3774.996 ± 101.814
117.432 ± 0.143
Apple M2 Pro
854
0.955 ± 0.001
0.122 ± 0.007
7015.664 ± 329.433
209.566 ± 0.897
Apple M4
854
0.946 ± 0.001
0.155 ± 0.004
5502.026 ± 132.498
135.929 ± 0.226
Apple M4 Pro
854
0.969 ± 0.001
0.084 ± 0.007
10178.273 ± 682.804
291.064 ± 1.677
Apple M5
854
0.950 ± 0.001
0.060 ± 0.003
14390.608 ± 634.761
152.178 ± 0.888
Apple M5 Max
854
1.009 ± 0.001
0.020 ± 0.005
44882.626 ± 6073.676
564.260 ± 1.638


The model is available as LiquidAI/LFM2.5-350M

See all models optimized for on-device at trymirai.com/local-models

Deploy and run models of any architecture directly on Apple devices.

Deploy and run models of any architecture directly on Apple devices.

On-device layer for AI model makers & products.

On-device layer for AI model makers & products.

On-device layer for AI model makers & products.