Inference Engine
The fastest inference runtime for iPhone, iPad and Mac.
Inference Engine
Optimize and run your model on every Apple device.
Up to 38% faster prompt processing vs MLX.
Optimize and run your model on every Apple device. Up to 38% faster prompt processing vs MLX.


Run your model on 2 billion Apple devices. Perfect for:
Model companies.
You train and ship models. Mirai optimizes them for Apple Silicon, benchmarks on real hardware, and distributes.
AI researchers & labs.
Mirai converts your model and puts it in front of real users on Apple devices, not just leaderboards.
Independent makers.
You're fine-tuning or training from scratch. Mirai gives your model the same device reach as OpenAI and DeepSeek.
Model companies.
You train and ship models. Mirai optimizes them for Apple Silicon, benchmarks on real hardware, and distributes.
AI researchers & labs.
Mirai converts your model and puts it in front of real users on Apple devices, not just leaderboards.
Independent makers.
You're fine-tuning or training from scratch. Mirai gives your model the same device reach as OpenAI and DeepSeek.
What Apple Silicon delivers today with Mirai.
Convert. Integrate. Run.

1import Uzu23public func runChat() async throws {4 let engineConfig = EngineConfig.create()5 let engine = try await Engine.create(config: engineConfig)67 guard let model = try await engine.model(identifier: "cartesia-ai/Llamba-1B") else {8 return9 }10 for try await update in try await engine.download(model: model).iterator() {11 print("Download progress: \(update.progress())")12 }1314 let messages = [15 ChatMessage.system().withText(text: "You are a helpful assistant"),16 ChatMessage.user().withText(text: "Tell me a short, funny story about a robot")17 ]18 let session = try await engine.chat(model: model, config: .create())19 let stream = await session.replyWithStream(input: messages, config: .create())20 var message: ChatMessage? = nil21 for try await update in stream.iterator() {22 switch update {23 case .replies(let replies):24 message = replies.last?.message25 case .error(let error):26 print("Error: \(error)")27 }28 }29 print("Text: \(message?.text() ?? "empty")")30}

1import Uzu23public func runChat() async throws {4 let engineConfig = EngineConfig.create()5 let engine = try await Engine.create(config: engineConfig)67 guard let model = try await engine.model(identifier: "cartesia-ai/Llamba-1B") else {8 return9 }10 for try await update in try await engine.download(model: model).iterator() {11 print("Download progress: \(update.progress())")12 }1314 let messages = [15 ChatMessage.system().withText(text: "You are a helpful assistant"),16 ChatMessage.user().withText(text: "Tell me a short, funny story about a robot")17 ]18 let session = try await engine.chat(model: model, config: .create())19 let stream = await session.replyWithStream(input: messages, config: .create())20 var message: ChatMessage? = nil21 for try await update in stream.iterator() {22 switch update {23 case .replies(let replies):24 message = replies.last?.message25 case .error(let error):26 print("Error: \(error)")27 }28 }29 print("Text: \(message?.text() ?? "empty")")30}
One inference engine. Integrate from any language.
Language
Distribution
Snippet
Rust
cargo add uzu --git https://github.com/trymirai/uzu
Rust
cargo add uzu --git https://github.com/trymirai/uzu
Swift
Swift Package Manager
https://github.com/trymirai/uzu.git
https://github.com/trymirai/uzu.git
TypeScript
pnpm add @trymirai/uzu
TypeScript
pnpm add @trymirai/uzu
Python
uv add uzu
Python
uv add uzu
Kotlin
Coming Soon
Kotlin
Coming Soon
Same high-level API across all languages.
Full performance of the Rust core from every language.
Convert once, integrate anywhere.
Built-in features every model gets automatically:
Speculative decoding.
A draft model predicts tokens ahead, your model verifies in one pass. Up to 2x faster generation.
Structured output.
Task-specific sessions.
Built-in performance metrics.
Speculative decoding
Structured output
Task-specific sessions
Built-in performance metrics


Supported models
Supported models:
Common questions:
How does model support work?
How does model support work?
What architectures are supported?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
What architectures are supported?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
How does Mirai compare to other inference engines?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
How does Mirai compare to other inference engines?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
What is the maximum supported model size?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
What is the maximum supported model size?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
How can I run benchmarks myself?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
How can I run benchmarks myself?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
How can we discuss a specific use case?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
How can we discuss a specific use case?
Framer is a design tool that allows you to design websites on a freeform canvas, and then publish them as websites with a single click.
