Models, runtime & infrastructure to make on-device AI fast, capable, & accessible.

Models, runtime & infrastructure to make on-device AI fast, capable, & accessible.

1.4B Apple Silicon devices in the world. The hardware is ready. The software isn't. Mirai builds the full stack to change that.

1.4B Apple Silicon devices in the world. The hardware is ready. The software isn't. Mirai builds the full stack to change that.

New Chat
Chats
Models
Benchmarks
Docs
Schedule team sync
Settings
On-device AIPowered by Mirai • Offline
Book a meeting with my team for tomorrow at 3pm and notify them on Slack.
Thought for 1.4s — planning tool sequence
calendar.create(date="tomorrow", time="15:00", title="Team sync")
calendar.invite(event_id="…", recipients="team")
slack.send(channel="#team", message="Meeting confirmed for tomorrow 3pm")

Mirai local model · 3 tool calls · 1,238 T/S · TTFT 71ms
Eject model
New Chat
Chats
Models
Benchmarks
Docs
Schedule team sync
Settings
On-device AIPowered by Mirai • Offline
Book a meeting with my team for tomorrow at 3pm and notify them on Slack.
Thought for 1.4s — planning tool sequence
calendar.create(date="tomorrow", time="15:00", title="Team sync")
calendar.invite(event_id="…", recipients="team")
slack.send(channel="#team", message="Meeting confirmed for tomorrow 3pm")

Mirai local model · 3 tool calls · 1,238 T/S · TTFT 71ms
Eject model
New Chat
Chats
Models
Benchmarks
Docs
Schedule team sync
Settings
On-device AIPowered by Mirai • Offline
Book a meeting with my team for tomorrow at 3pm and notify them on Slack.
Thought for 1.4s — planning tool sequence
calendar.create(date="tomorrow", time="15:00", title="Team sync")
calendar.invite(event_id="…", recipients="team")
slack.send(channel="#team", message="Meeting confirmed for tomorrow 3pm")

Mirai local model · 3 tool calls · 1,238 T/S · TTFT 71ms
Eject model
rides
bank
calendar
files
messages
photos
mail
maps
docs
rides
bank
calendar
files
messages
photos
mail
maps
docs
rides
bank
calendar
files
messages
photos

The last decade shipped one app per service.

Every task earned its own icon. Bank. Calendar. Files. Messaging. We trained ourselves to context-switch between dozens of interfaces just to get one thing done.

Every task earned its own icon. Bank. Calendar. Files. Messaging. We trained ourselves to context-switch between dozens of interfaces just to get one thing done.

Est. 8.9M mobile apps  ·  Apple App Store + Google Play, 2026

Est. 8.9M mobile apps  ·  Apple App Store + Google Play, 2026

The next decade collapses the app grid into one interface.

One instruction. Multiple services. Zero app launches. The assistant resolves everything in a single interaction, privately, on your device.

One instruction. Multiple services. Zero app launches. The assistant resolves everything in a single interaction, privately, on your device.

Chase · Calendar · Files · Uber · resolved in one interaction

Chase · Calendar · Files · Uber · resolved in one interaction

Chase
Calendar
Files
Uber
Pay Anna, send the contract, schedule lunch Friday, ride to LAX at six.
Chase
Calendar
Files
Uber
Pay Anna, send the contract, schedule lunch Friday, ride to LAX at six.
Execution speed we are aiming for ...
Execution speed we are aiming for ...

0 t/s is where models become interfaces.

0 t/s is where models become interfaces.

1,000 t/s
slow chatthinking...
AI-native interfacesrenders almost instantly
820 t/s thinking...
1,000 t/s
slow chatthinking...
AI-native interfacesrenders almost instantly
1,000 t/s rendering UI

1

User interfaces break when the model waits.

Modern AI interfaces constantly regenerate layouts, repair outputs, rerender state. Users tolerate waiting for text. They do not tolerate waiting.

1

User interfaces break when the model waits.

Users do not tolerate waiting.

2

The user sees 1 response. The model generates 100+ internally.

Parsing. Validation. Ranking. Schema repair. State updates. Most model execution happens before the user sees anything.

2

The user sees 1 response. The model generates 100+ internally.

Parsing. Validation. Ranking. Schema repair. State updates. Most model execution happens before the user sees anything.

3

Faster models do not make interfaces realtime. Faster execution does.

The bottleneck is sustaining realtime interaction under strict latency constraints. That is an execution problem. Not just a model problem.

The bottleneck is sustaining realtime interaction under strict latency constraints.

mirai — airplane mode · M4 Pro
$
mirai — airplane mode · M4 Pro
$
mirai — airplane mode · M4 Pro
$

We are building every layer from the device constraint up to cross 1,000 t/s.

We are building every layer from scratch.

We are building every layer from scratch.

Not adapted from cloud, not ported from existing runtimes.

Not adapted from cloud or existing runtimes.

The model knows the runtime it runs on.

The model knows the runtime it runs on.

The runtime knows the model it serves.

The runtime knows the model it serves.

On-device is not
a smaller cloud.

It’s a different system entirely.

On-device is not a smaller cloud. It’s a different system entirely.

Cloud

Cloud

Large batches

Large batches

Throughput-first

Throughput-first

Memory-rich

Memory-rich

Compute-bound

Compute-bound

On-device

On-device

Batch size = 1

Batch size = 1

Latency-first

Latency-first

Memory-constrained

Memory-constrained

Bandwidth-bound

Bandwidth-bound

What are we doing differently?

1. Models

Architectures designed for memory movement, not just model quality.

Architectures designed for memory movement, not just model quality.

We optimize for:

Higher arithmetic intensity during decoding

Smaller resident sets

Block generation, not token-by-token decoding

2. Runtime

Built around Apple Silicon, not portability abstractions.

Built around Apple Silicon, not portability abstractions.

We optimize for:

Metal-native execution

Custom tensor scheduling

Hardware-aware memory layouts

  1. Execution

Compression is part of our architecture, not a post-processing step.

Compression is part of our architecture, not a post-processing step.

Co-designed together:

Quantization

Speculative decoding

Kernels & weight layouts

We are building ...
We are building ...

The full on-device stack to
achieve realtime local intelligence.

The full on-device stack to achieve realtime local intelligence.

The full on-device stack to achieve realtime local intelligence.

1

Local models

Architectures designed for memory-bound execution and realtime decoding.

1

Local models

Architectures designed for memory-bound execution and realtime decoding.

2

Inference engine

Hardware-native orchestration of tensors, memory, and accelerators.

2

Inference engine

Hardware-native orchestration of tensors, memory, and accelerators.

3

Quantization

Compress models without collapsing interaction quality or latency.

3

Quantization

Compress models without collapsing interaction quality or latency.

4

The experience

Sustain continuous UI generation, validation, retries, and rerenders locally.

4

The experience

Sustain continuous UI generation, validation, retries, and rerenders locally.

Silicon

Apple devices already have the compute. The missing layer is software.

Silicon

Apple devices already have the compute. The missing layer is software.

On-device AI needs a frontier
lab. Not another wrapper.

On-device AI needs a frontier lab. Not another wrapper.

On-device AI needs
a frontier lab. Not another wrapper.

The silicon is shipped. 1.4 billion devices are waiting. The software is the open problem. And it's ours to solve.

The silicon is shipped. 1.4 billion devices are waiting. The software is the open problem. And it's ours to solve.