Inference Engine
The fastest inference runtime for iPhone, iPad and Mac.
Your models.
Every Apple device.
The fastest inference engine for Apple Silicon.
Your models. Every Apple device. The fastest inference engine for Apple Silicon.
Optimize and run your model on every Apple device. Up to 38% faster prompt processing and 18% faster generation than MLX.


What Apple Silicon delivers today with Mirai.
Convert. Integrate. Run.


You built the model. We get it running on 2 billion Apple devices.
Model companies.
You train and ship models. Mirai optimizes them for Apple Silicon, benchmarks on real hardware, and distributes across Apple devices.
You train and ship models. Mirai optimizes them for Apple Silicon, benchmarks on real hardware, and distributes across Apple devices.
You train and ship models. Mirai optimizes them for Apple Silicon, benchmarks on real hardware, and distributes across Apple devices.
AI researchers & labs.
AI researchers & labs.
AI researchers & labs.
You publish on Hugging Face. Mirai converts your model and puts it in front of real users on real devices, not just leaderboards.
You publish on Hugging Face. Mirai converts your model and puts it in front of real users on real devices, not just leaderboards.
You publish on Hugging Face. Mirai converts your model and puts it in front of real users on real devices, not just leaderboards.
Independent model makers.
Independent model makers.
Independent model makers.
You're fine-tuning or training from scratch. Mirai gives your model the same Apple device reach as LiquidAI, OpenAI, and DeepSeek.
You're fine-tuning or training from scratch. Mirai gives your model the same Apple device reach as LiquidAI, OpenAI, and DeepSeek.
You're fine-tuning or training from scratch. Mirai gives your model the same Apple device reach as LiquidAI, OpenAI, and DeepSeek.
Your model gets these out of the box.
Speculative decoding.
A smaller draft model predicts multiple tokens ahead. Your main model verifies them in one pass. Fewer forward passes, same output quality.
Structured output.
JSON mode, constrained generation, grammar-guided decoding. Your model outputs structured data reliably.
Task-specific sessions.
Optimized execution modes for classification, summarization, and generation. Each mode is tuned for the workload.
Built-in performance metrics.
Tokens per second, time to first token, generation duration. Every run returns detailed metrics automatically.
Speculative decoding.
A smaller draft model predicts multiple tokens ahead. Your main model verifies them in one pass. Fewer forward passes, same output quality.
Structured output.
JSON mode, constrained generation, grammar-guided decoding. Your model outputs structured data reliably.
Task-specific sessions.
Optimized execution modes for classification, summarization, and generation. Each mode is tuned for the workload.
Built-in performance metrics.
Tokens per second, time to first token, generation duration. Every run returns detailed metrics automatically.



Works with any stack.
Simple, high-level API.
Hybrid architecture.
Unified model configurations
Traceable computations.
Utilizes unified memory on Apple devices.
Language
Package
Distribution
Rust
Mirai UZU
Cargo
Swift
Mirai UZU Swift
Swift Package Manager
TypeScript
Mirai UZU TS
NPM (Node.js)
Kotlin
Coming Soon
Python
Coming Soon
Supported models
Supported models:
Route models between device and cloud.
Common questions:
How does model support work?
How does model support work?
What architectures are supported?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
What architectures are supported?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
How does Mirai compare to other inference engines?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
How does Mirai compare to other inference engines?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
What is the maximum supported model size?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
What is the maximum supported model size?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
How can I run benchmarks myself?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
How can I run benchmarks myself?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
How can we discuss a specific use case?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
How can we discuss a specific use case?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.