Models, runtime & infrastructure to make
on-device AI interactive, ambient & continious.

We believe the next generation of AI products will be built around continuous interaction without isolated prompts & chat templates. That only becomes possible when latency disappears.

We believe the next generation of AI products will be built around continuous interaction without isolated prompts & chat templates. That only becomes possible when latency disappears.

New Chat
Chats
Models
Benchmarks
Docs
Schedule team sync
Settings
On-device AIPowered by Mirai • Offline
Thought for 1.4s — planning tool sequence
calendar.create(date="tomorrow", time="15:00", title="Team sync")
calendar.invite(event_id="…", recipients="team")
slack.send(channel="#team", message="Meeting confirmed for tomorrow 3pm")

Mirai local model · 3 tool calls · 1,238 T/S · TTFT 71ms
Eject model
New Chat
Chats
Models
Benchmarks
Docs
Schedule team sync
Settings
On-device AIPowered by Mirai • Offline
Thought for 1.4s — planning tool sequence
calendar.create(date="tomorrow", time="15:00", title="Team sync")
calendar.invite(event_id="…", recipients="team")
slack.send(channel="#team", message="Meeting confirmed for tomorrow 3pm")

Mirai local model · 3 tool calls · 1,238 T/S · TTFT 71ms
Eject model
New Chat
Chats
Models
Benchmarks
Docs
Schedule team sync
Settings
On-device AIPowered by Mirai • Offline
Thought for 1.4s — planning tool sequence
calendar.create(date="tomorrow", time="15:00", title="Team sync")
calendar.invite(event_id="…", recipients="team")
slack.send(channel="#team", message="Meeting confirmed for tomorrow 3pm")

Mirai local model · 3 tool calls · 1,238 T/S · TTFT 71ms
Eject model
rides
bank
calendar
files
messages
photos
mail
maps
docs
rides
bank
calendar
files
messages
photos
mail
maps
docs
rides
bank
calendar
files
messages
photos

The last decade shipped one app per service.

Every task earned its own icon. Bank. Calendar. Files. Messaging. We trained ourselves to context-switch between dozens of interfaces just to get one thing done.

Every task earned its own icon. Bank. Calendar. Files. Messaging. We trained ourselves to context-switch between dozens of interfaces just to get one thing done.

Est. 8.9M mobile apps  ·  Apple App Store + Google Play, 2026

Est. 8.9M mobile apps  ·  Apple App Store + Google Play, 2026

The next decade collapses the app grid into one interface.

One instruction. Multiple services. Zero app launches. The assistant resolves everything in a single interaction, privately, on your device.

One instruction. Multiple services. Zero app launches. The assistant resolves everything in a single interaction, privately, on your device.

Chase · Calendar · Files · Uber · resolved in one interaction

Chase · Calendar · Files · Uber · resolved in one interaction

Chase
Calendar
Files
Uber
Pay Anna, send the contract, schedule lunch Friday, ride to LAX at six.
Chase
Calendar
Files
Uber
Pay Anna, send the contract, schedule lunch Friday, ride to LAX at six.
Chase
Calendar
Files
Uber
Pay Anna, send the contract, schedule lunch Friday, ride to LAX at six.
Execution speed we are aiming for ...
Execution speed we are aiming for ...

0 t/s is where models become interfaces.

0 t/s is where models become interfaces.

1,000 t/s
slow chatthinking...
AI-native interfacesrenders almost instantly
820 t/s thinking...
1,000 t/s
slow chatthinking...
AI-native interfacesrenders almost instantly
1,000 t/s rendering UI

1

User interfaces break when the model waits.

AI interfaces regenerate layouts, repair outputs, and rerender state. Users tolerate waiting for text. Not for interaction.

1

User interfaces break when the model waits.

AI interfaces regenerate layouts, repair outputs, and rerender state. Users tolerate waiting for text. Not for interaction.

2

The user sees 1 response. The model generates 100+ internally.

Parsing. Validation. Ranking. Schema repair. State updates. Most computation happens before the user sees anything.

2

The user sees 1 response. The model generates 100+ internally.

Parsing. Validation. Ranking. Schema repair. State updates. Most computation happens before the user sees anything.

3

Faster models do not make interfaces realtime. Faster execution does.

Faster models don’t remove latency. The bottleneck is sustaining interaction under strict constraints.

Faster models don’t remove latency. The bottleneck is sustaining interaction under strict constraints.

mirai — airplane mode · M4 Pro
$
mirai — airplane mode · M4 Pro
$
mirai — airplane mode · M4 Pro
$

1,000 t/s requires a different execution stack.

Interactive AI isn't one response. It's a continuous loop: parsing, validating, ranking, rendering. Each step has a latency budget. Miss one, the experience breaks.

Interactive AI is a continuous loop: parsing, validating, ranking, rendering. Each step has a latency budget. Miss one, the experience breaks.

Interactive AI isn't one response. It's a continuous loop: parsing, validating, ranking, rendering. Each step has a latency budget. Miss one, the experience breaks.

To solve that, we are building every layer from scratch:

To solve that, we are building every layer from scratch:

From the device constraint up.

From the device constraint up.

A runtime that executes batch size = 1.

A runtime that executes batch size = 1.

A quantization scheme co-designed with the architecture.

A quantization scheme co-designed with the architecture.

Not adapted from cloud or existing runtimes.

Why can't we just run a cloud model on a device?

Why can't we just run a cloud model on a device?

On-device is not a smaller cloud. It’s a different system entirely.

On-device is not a smaller cloud. It’s a different system entirely.

Cloud

Cloud

Large batches

Large batches

Throughput-first

Throughput-first

Memory-rich

Memory-rich

Compute-saturating

Compute-saturating

On-device

On-device

Batch size = 1

Batch size = 1

Latency-first

Latency-first

Memory-constrained

Memory-constrained

Bandwidth-sensitive

Bandwidth-sensitive

Performance emerges from co-design across the stack.

On a device, every request is processed alone. No batching, no parallelism. Mirai co-designs models and runtime specifically for this constraint, on Apple Silicon.

On a device, every request is processed alone. No batching, no parallelism. Mirai co-designs models and runtime specifically for this constraint, on Apple Silicon.

Minimizing memory footprint.

Minimizing memory footprint.

Maximizing arithmetic intensity.

Maximizing arithmetic intensity.

Maximizing neural accelerator utilization.

Maximizing neural accelerator utilization.

Hardware-native quantization.

Hardware-native quantization.

We are building ...
We are building ...

The full on-device stack to
achieve realtime local intelligence.

The full on-device stack to achieve realtime local intelligence.

The full on-device stack to achieve realtime local intelligence.

1

Local models

Architectures designed for memory-bound execution and realtime decoding.

1

Local models

Architectures designed for memory-bound execution and realtime decoding.

2

Inference engine

Hardware aware optimization of tensor multiplications & other operations.

2

Inference engine

Hardware aware optimization of tensor multiplications & other operations.

3

Quantization

Compress models without collapsing interaction quality or latency.

3

Quantization

Compress models without collapsing interaction quality or latency.

4

Application layer

We start from your Apple device automation.

4

Application layer

We start from your Apple device automation.

On-device AI needs a frontier
lab. Not another wrapper

On-device AI needs a frontier lab. Not another wrapper.

On-device AI needs
a frontier lab. Not another wrapper

The silicon is shipped. 1.4 billion devices are waiting. The software is the open problem. And it's ours to solve

The silicon is shipped. 1.4 billion devices are waiting. The software is the open problem. And it's ours to solve