Models, runtime & infrastructure to make on-device AI real-time & interactive.
1.4B Apple Silicon devices in the world. The hardware is ready. The software isn't. Mirai builds the full stack to change that.
1.4B Apple Silicon devices in the world. The hardware is ready. The software isn't. Mirai builds the full stack to change that.
calendar.create(date="tomorrow", time="15:00", title="Team sync")calendar.invite(event_id="…", recipients="team")slack.send(channel="#team", message="Meeting confirmed for tomorrow 3pm")calendar.create(date="tomorrow", time="15:00", title="Team sync")calendar.invite(event_id="…", recipients="team")slack.send(channel="#team", message="Meeting confirmed for tomorrow 3pm")calendar.create(date="tomorrow", time="15:00", title="Team sync")calendar.invite(event_id="…", recipients="team")slack.send(channel="#team", message="Meeting confirmed for tomorrow 3pm")The last decade shipped one app per service.
Every task earned its own icon. Bank. Calendar. Files. Messaging. We trained ourselves to context-switch between dozens of interfaces just to get one thing done.
Every task earned its own icon. Bank. Calendar. Files. Messaging. We trained ourselves to context-switch between dozens of interfaces just to get one thing done.
Est. 8.9M mobile apps · Apple App Store + Google Play, 2026
Est. 8.9M mobile apps · Apple App Store + Google Play, 2026
The next decade collapses the app grid into one interface.
One instruction. Multiple services. Zero app launches. The assistant resolves everything in a single interaction, privately, on your device.
One instruction. Multiple services. Zero app launches. The assistant resolves everything in a single interaction, privately, on your device.
Chase · Calendar · Files · Uber · resolved in one interaction
Chase · Calendar · Files · Uber · resolved in one interaction
0 t/s is where models become interfaces.
0 t/s is where models become interfaces.
1
User interfaces break when the model waits.
AI interfaces regenerate layouts, repair outputs, and rerender state. Users tolerate waiting for text. Not for interaction.
1
User interfaces break when the model waits.
AI interfaces regenerate layouts, repair outputs, and rerender state. Users tolerate waiting for text. Not for interaction.
2
The user sees 1 response. The model generates 100+ internally.
Parsing. Validation. Ranking. Schema repair. State updates. Most computation happens before the user sees anything.
2
The user sees 1 response. The model generates 100+ internally.
Parsing. Validation. Ranking. Schema repair. State updates. Most computation happens before the user sees anything.
3
Faster models do not make interfaces realtime. Faster execution does.
Faster models don’t remove latency. The bottleneck is sustaining interaction under strict constraints.
Faster models don’t remove latency. The bottleneck is sustaining interaction under strict constraints.
We are building every layer from the device constraint up to cross 1,000 t/s.
We are building every layer from scratch.
We are building every layer from scratch.
Not adapted from cloud, not ported from existing runtimes.
Not adapted from cloud, not ported from existing runtimes.
Not adapted from cloud or existing runtimes.
On-device is not
a smaller cloud.
It’s a different system entirely.
On-device is not a smaller cloud. It’s a different system entirely.
Cloud
Cloud
Large batches
Large batches
Throughput-first
Throughput-first
Memory-rich
Memory-rich
Compute-saturating
Compute-saturating
On-device
On-device
Batch size = 1
Batch size = 1
Latency-first
Latency-first
Memory-constrained
Memory-constrained
Bandwidth-sensitive
Bandwidth-sensitive
What are we doing differently?
1. Models
Architectures designed for memory movement, not just model quality.
Architectures designed for memory movement, not just model quality.
We optimize for:
Higher arithmetic intensity during decoding
Smaller resident sets
Block generation, not token-by-token decoding
2. Runtime
Built around Apple Silicon, not portability abstractions.
Built around Apple Silicon, not portability abstractions.
We optimize for:
Metal-native execution
Custom tensor scheduling
Hardware-aware memory layouts
Execution
Compression is part of our architecture, not a post-processing step.
Compression is part of our architecture, not a post-processing step.
Co-designed together:
Quantization
Speculative decoding
Kernels & weight layouts
The full on-device stack to
achieve realtime local intelligence.
The full on-device stack to achieve realtime local intelligence.
The full on-device stack to achieve realtime local intelligence.
1
Local models
Architectures designed for memory-bound execution and realtime decoding.
1
Local models
Architectures designed for memory-bound execution and realtime decoding.
2
Inference engine
Hardware aware optimization of tensor multiplications & other operations.
2
Inference engine
Hardware aware optimization of tensor multiplications & other operations.
3
Quantization
Compress models without collapsing interaction quality or latency.
3
Quantization
Compress models without collapsing interaction quality or latency.
4
Application layer
We start from your Apple device automation.
4
Application layer
We start from your Apple device automation.
On-device AI needs a frontier
lab. Not another wrapper
On-device AI needs a frontier lab. Not another wrapper.
On-device AI needs
a frontier lab. Not another wrapper
The silicon is shipped. 1.4 billion devices are waiting. The software is the open problem. And it's ours to solve
The silicon is shipped. 1.4 billion devices are waiting. The software is the open problem. And it's ours to solve