Product

Models library

MacOS app

Docs

Company

...

Talk to us

Try SDK

Product

Models library

MacOS app

Docs

Company

Talk to us

Try SDK

Talk to us

Deploy and run models of any architecture

On-device layer for AI model makers & products

Talk to us

Explore docs

Talk to us

Explore docs

Deploy and run models of any architecture

On-device layer for AI model makers and products.

Talk to us

Explore docs

Trusted + backed by leading AI funds and individuals

Scout
Fund
Scout
Fund
Thomas Wolf
Co-founder
Marcin Zukowski
Co-founder
Laura Modiano
Startups EMEA
Siqi Chen
CEO
Mati Staniszewski
Co-founder, CEO

Trusted + backed by leading AI funds and individuals

Scout
Fund
Scout
Fund
Thomas Wolf
Co-founder
Marcin Zukowski
Co-founder
Laura Modiano
Startups EMEA
Siqi Chen
CEO
Mati Staniszewski
Co-founder, CEO

Trusted + backed by leading AI funds & individuals

Scout
Fund
Scout
Fund
Thomas Wolf
Co-founder
Marcin Zukowski
Co-founder
Laura Modiano
Startups EMEA
Siqi Chen
CEO
Mati Staniszewski
Co-founder, CEO

Trusted + backed by leading AI funds & individuals

Scout
Fund
Scout
Fund
Thomas Wolf
Co-founder
Marcin Zukowski
Co-founder
Laura Modiano
Startups EMEA
Siqi Chen
CEO
Mati Staniszewski
Co-founder, CEO

The fastest on-device inference engine built from scratch

Outpeforming

Apple MLX

llama.cpp

Built for model makers

Extend your model beyond the cloud

Process part of your user requests directly on user devices. Keeping your inference backend

Keep your inference backend. Add Mirai to expose part of your pipeline on user devices

Key benefits

Instant, private inference.

Instant, private inference

Near-zero latency and full data privacy.

Near-zero latency and full data privacy

Route requests between device & cloud.

Route requests between device & cloud

Based on your custom rules.

Based on your custom rules

Add and run any custom model architecture.

Add and run any custom model architecture

Hardware-aware execution across memory, scheduling, & kernels.

Hardware-aware execution across memory, scheduling, & kernels

Granular access control.

Granular access control

Choose which developers can access models.

Choose which developers can access models

Mirror your existing pricing.

Mirror your existing pricing

Tokens, licenses, revshare.

Tokens, licenses, revshare

Metric

16 Pro Max (A18 Pro)

M1 Ultra

M4 Max

Metric

16 Pro Max (A18 Pro)

M1 Ultra

M4 Max

Time to first token, s

0.303

0.066

0.188

0.041

Time to first token, s

0.303

0.066

0.188

0.041

Tokens per sec, t/s

20.598

197.093

35.572

172.276

Tokens per sec, t/s

20.598

197.093

35.572

172.276

* Llama-3.2-1B-Instruct, float16 precision, 37 input tokens

16 Pro Max (A18 Pro)

0.303

Time to first token, s

20.598

Tokens per sec, t/s

M1 Ultra

0.066

Time to first token, s

197.0

Tokens per sec, t/s

0.188

Time to first token, s

35.572

Tokens per sec, t/s

M4 Max

0.041

Time to first token, s

172.276

Tokens per sec, t/s

* Llama-3.2-1B-Instruct, float16 precison, 37 input tokens

Built for developers

Easily integrate modern AI pipelines into your app

Free 10K Devices

Try Mirai SDK for free

Drop-in SDK for local + cloud inference.

Model conversion + quantization handled.

Local-first workflows for text, audio, vision.

One developer can get it all running in minutes.

Try SDK

Explore docs

All major SOTA models supported

Gemma
Polaris
HuggingFace
DeepSeek
Llama
Qwen

Explore supported models

Build real-time AI experiences with on-device inference

Users don’t care where your model runs. They care how it feels.

Fast responses for text and audio.

Offline continuity. No network, no break.

Consistent latency. Even under load.

Run models on-device or in the cloud. Using the same API

We’ve partnered with Baseten to give you full control over where inference runs. Without changing your code

Learn more

Free your cloud.
Run your models locally

Deploy and run models of any architecture directly on user devices

Talk to us

Try SDK

Explore docs

Talk to us

Try SDK

Explore docs

Talk to us

Try SDK

Explore docs

Talk to us

Try SDK

Explore docs

Main

Apple Inference SDK

Smart Routing

Models Library

MacOS App

Blog

Docs

Company

About us

Careers

Links

X (Twitter)

Github

Discord

Main

Apple Inference SDK

Smart Routing

Models Library

MacOS App

Blog

Docs

Company

About us

Careers

Links

X (Twitter)

Github

Discord

Main

Apple Inference SDK

Smart Routing

Models Library

MacOS App

Blog

Docs

Company

About us

Careers

Links

X (Twitter)

Github

Discord

Main

Apple Inference SDK

Smart Routing

Models Library

MacOS App

Blog

Docs

Company

About us

Careers

Links

X (Twitter)

Github

Discord

On-device layer for AI model makers & products

On-device layer for AI model makers and products.

The fastest on-device inference engine built from scratch

The fastest on-device inference engine built from scratch

Extend your model beyond the cloud

Key benefits

Key benefits

Easily integrate modern AI pipelines into your app

Try Mirai SDK for free

All major SOTA models supported

Build real-time AI experiences with on-device inference

Run models on-device or in the cloud. Using the same API

Run models on-device or in the cloud. Using the same API

Free your cloud.Run your models locally

Free your cloud.
Run your models locally