LFM2.5-350M on Apple Silicon: by Mirai Labs.

Product

Models library

Docs

MacOS app

Careers

Company

1455

Product

Models library

Docs

MacOS app

Careers

Company

1455

Why On‑Device AI is the Next Big Thing for iOS Apps in 2025

LFM2.5-350M on Apple Silicon

Mirai team

The Fastest inference engine for Apple

Mar 31, 2026

We are adding support for the latest and smallest language model from our partners from Liquid AI: LFM-2.5 350M. It is a tiny model that punches significantly above its weight class, outperforming much larger Qwen3.5-0.8B on reasoning and agentic tool use benchmarks.

The eval numbers are impressive:

GPQA Diamond:	30.71%
IFBench:	40.67%
BFCL v3:	43.98%
BFCL v4:	21.98%
Tau2-Telecom:	18.42%

The benchmarks

We are rolling out bfloat16 support initially, with our own 4-bit and 8-bit quantized checkpoints coming soon. Even when running in full precision, the model achieves the throughput of over 70 t/s on an iPhone, which meets and surpasses the interactivity needs of most applications.

Device	Prompt Tokens	Used Memory, GB	Time to 1st t., S	Prompt, T/S	Generate, T/S
Apple A18 Pro	854	0.945 ± 0.000	0.439 ± 0.023	1952.753 ± 110.379	72.927 ± 0.202
Apple M1	854	0.940 ± 0.000	0.318 ± 0.009	2687.680 ± 71.626	81.988 ± 0.196
Apple M1 Max	854	0.985 ± 0.001	0.084 ± 0.015	10391.442 ± 1167.325	328.543 ± 0.766
Apple M1 Ultra	854	1.004 ± 0.001	0.055 ± 0.032	17401.436 ± 3429.928	412.449 ± 2.484
Apple M2	854	0.943 ± 0.001	0.227 ± 0.007	3774.996 ± 101.814	117.432 ± 0.143
Apple M2 Pro	854	0.955 ± 0.001	0.122 ± 0.007	7015.664 ± 329.433	209.566 ± 0.897
Apple M4	854	0.946 ± 0.001	0.155 ± 0.004	5502.026 ± 132.498	135.929 ± 0.226
Apple M4 Pro	854	0.969 ± 0.001	0.084 ± 0.007	10178.273 ± 682.804	291.064 ± 1.677
Apple M5	854	0.950 ± 0.001	0.060 ± 0.003	14390.608 ± 634.761	152.178 ± 0.888
Apple M5 Max	854	1.009 ± 0.001	0.020 ± 0.005	44882.626 ± 6073.676	564.260 ± 1.638

The model is available as LiquidAI/LFM2.5-350M

See all models optimized for on-device at trymirai.com/local-models

Models, runtime & infrastructure to make on-device AI fast, capable, & accessible.

Learn more

Main

Company

Links

Platform / SDK

Inference Runtime

Models Conversion

Models Library

MacOS App

Blog

Docs

About us

Careers

X (Twitter)

Github

Discord

Platform / SDK

Inference Runtime

Models Conversion

Models Library

MacOS App

Blog

Docs

Main

About us

Careers

Company

X (Twitter)

Github

Discord

Links

LFM2.5-350M on Apple Silicon

LFM2.5-350M on Apple Silicon

The benchmarks

Device

Prompt Tokens

Used Memory, GB

Time to 1st t., S

Prompt, T/S

Generate, T/S

Apple A18 Pro

854

0.945 ± 0.000

0.439 ± 0.023

1952.753 ± 110.379

72.927 ± 0.202

Apple M1

854

0.940 ± 0.000

0.318 ± 0.009

2687.680 ± 71.626

81.988 ± 0.196

Apple M1 Max

854

0.985 ± 0.001

0.084 ± 0.015

10391.442 ± 1167.325

328.543 ± 0.766

Apple M1 Ultra

854

1.004 ± 0.001

0.055 ± 0.032

17401.436 ± 3429.928

412.449 ± 2.484

Apple M2

854

0.943 ± 0.001

0.227 ± 0.007

3774.996 ± 101.814

117.432 ± 0.143

Apple M2 Pro

854

0.955 ± 0.001

0.122 ± 0.007

7015.664 ± 329.433

209.566 ± 0.897

Apple M4

854

0.946 ± 0.001

0.155 ± 0.004

5502.026 ± 132.498

135.929 ± 0.226

Apple M4 Pro

854

0.969 ± 0.001

0.084 ± 0.007

10178.273 ± 682.804

291.064 ± 1.677

Apple M5

854

0.950 ± 0.001

0.060 ± 0.003

14390.608 ± 634.761

152.178 ± 0.888

Apple M5 Max

854

1.009 ± 0.001

0.020 ± 0.005

44882.626 ± 6073.676

564.260 ± 1.638

Models, runtime & infrastructure to make on-device AI fast, capable, & accessible.

Models, runtime & infrastructure to make on-device AI fast, capable, & accessible.