Research

Models library

Inference

Products

Docs

Company

On-device AI research & products that make the assistant the only interface you need.

We build the models, runtime, and infrastructure to make on-device AI fast, capable, and accessible. On the hardware billions of people already own.

Read our research

Talk to us

Listening...
·
Processing on-device...
Searched 4 providers on-device
Hertz
$78 · RAV4
Enterprise
$84 · Escape
Avis
$89 · Equinox
Booked Hertz · $78/day
0 apps · 0 network calls · airplane mode
·
Composing on-device...
Draft ready · on-device
To: Sarah
Drafted in 240 ms
0 apps · 0 network calls · airplane mode

The last decade shipped one app per service.

Every task earned its own icon. Rideshare. Bank. Calendar. Files. Messaging. Photos. We trained ourselves to context-switch between dozens of interfaces just to get one thing done.

Est. 8.9M mobile apps  ·  Apple App Store + Google Play, 2026

ChaseSUN21CalendarFilesUberUber"Pay Anna, send the contract,schedule lunch Friday,ride to LAX at six."

The next decade collapses the app grid into one interface.

One instruction. Five services. Zero app launches. The assistant understands intent. Not commands. And resolves everything in a single interaction, privately, on your device.

Chase · Calendar · Files · Uber · resolved in one interaction

thinking ...

AI on a device needs to be fast enough that it feels less like software and more like thinking

mirai · predicting next token
127 ms · on-device
Book a rental from
0.55
LAX,
0.18
JFK,
0.13
Hertz
0.08
the
0.06
any
top-1 prob · 0.55 · generated on-device

You say what you need.
It's done. No app to open, no form to fill, no network required.

This is what the next decade looks like. Running today, on real hardware, offline. The AI capable of collapsing the app grid into one interface already exists. What Mirai is doing is making sure it runs on the device in your pocket, not just in a lab.

Technical details

mirai — airplane mode · M4 Pro
$

1,000 t/s

Above this, the assistant stops returning text. It renders interfaces. Speed is what turns tokens into a UI

1,000 t/s
slow chatthinking...
instant UIrenders now
1,240 t/s instant UI

What we are building to achive 1,000 t/s on device

1

Local models

Architecture designed from device memory constraints — not compressed cloud models. Maximizes arithmetic intensity at decoding, minimizes resident set.

2

Inference engine

Built from scratch for device silicon — Metal kernels, hardware-native, not portable. The model and runtime co-evolve. No generic runtime can do this.

3

Quantization

W8A8+VQ co-designed with architecture and kernels. Hardware-accelerated weight loading. Without it, the model doesn't fit. With it, nothing is sacrificed.

4

The experience

Tokens become interfaces, forms, and resolved workflows. This is the product. Everything above is what makes it possible.

Silicon

Apple M-series. Already shipped — 1.4B devices worldwide. The silicon is done. The software is the open problem.

Apple Silicon — now
Android — coming soon
Performance vs MLX · LFM2-1.2B · M1 Ultra
Mirai runtimeMLX
MIRAIMLX
Prompt Speed
tokens/sec · higher →
MIRAIMLX
Generate Speed
tokens/sec · higher →
MIRAIMLX
First Token
latency · lower →

Technical details

How we build it