Platform

Model library

Our apps

Use cases

Pricing

Docs

Blog

Company

Apple Silicon SDK & Inference on iOS & Mac

Apple Silicon SDK & Inference on iOS & Mac

Run high-performance inference locally

Blazing-Fast AI Fully On-Device

Mirai unlocks the full power of Apple’s ANE and GPU, delivering best-in-class inference speed for small models.

Try Mirai

Try Mirai

Try Mirai

Contact us

Contact us

Contact us

Optimised for Apple Silicon

First 10,000 devices for free

CoreML compatible

Instant setup, no guesswork

Pick a use case, choose the model, and drop into your code. Mirai handles quantization, fallback logic, and structured output

Try Mirai

Purpose-built for Apple hardware & optimized to the limit

Built from the ground up to outperform existing solutions like CoreML and MLX

Built from the ground up to outperform existing solutions like CoreML and MLX

Metric

MacBook Pro (M1, 2020)

MacBook Air (M2, 2022)

MacBook Pro (M3 Pro, 2023)

iPhone 15 Pro (A17 Pro)

iPhone 16 Pro (A18 Pro)

Metric

MacBook Pro (M1, 2020)

MacBook Air (M2, 2022)

MacBook Pro (M3 Pro, 2023)

iPhone 15 Pro (A17 Pro)

iPhone 16 Pro (A18 Pro)

Tokens / sec

16.7

18.9

22.4

19.6

22.1

Tokens / sec

16.7

18.9

22.4

19.6

22.1

Time to 1st token

480ms

440ms

390ms

410ms

365ms

Time to 1st token

480ms

440ms

390ms

410ms

365ms

Peak Memory Usage

1.3GB

1.2GB

1.1GB

950MB

900MB

Peak Memory Usage

1.3GB

1.2GB

1.1GB

950MB

900MB

MacBook Pro (M1, 2020)

16.7

Tokens / sec

480ms

Time to 1st token

1.3GB

Peak memory usage

iPhone 15 Pro (A17 Pro)

23.4

Tokens / sec

560ms

Time to 1st token

0.9MB

Peak memory usage

Fastest Inference on iOS

1.5×–10× faster than existing solutions. Built specifically for the Apple Neural Engine with…

1.5×–10× faster than existing solutions. Built specifically for the Apple Neural Engine with…

Speculation & prefix caching

Speculation & prefix caching

Accelerated generation with speculative decoding and reusable KV caches. Built-in, not bolted on.

Accelerated generation with speculative decoding and reusable KV caches. Built-in, not bolted on.

Deterministic structured output

Deterministic structured output

Schema-aligned JSON results out of the box — perfect for workflows that demand reliability.

Schema-aligned JSON results out of the box — perfect for workflows that demand reliability.

Low TTFT, energy-aware runtime

Low TTFT, energy-aware runtime

Blazing-fast time-to-1st-token with smart use of IOSurface, static shapes, and chip-specific…

Blazing-fast time-to-1st-token with smart use of IOSurface, static shapes, and chip-specific…

Benchmarked. Tuned. No unnecessary complexity

Simply choose a use case to start with

General Chat

Conversational AI, running on-device

Classification

Tag text by topic, intent, or sentiment

Summarisation

Conversational AI, running on-device

Custom

Build your own use case

Camera

Process images with local models

COMING SOON

Voice

Turn voice into actions or text

COMING SOON

First 10,000 used devices for free

Try Mirai

Try Mirai

Contact us

Contact us

Try Mirai

Contact us

Setup in less than 10 minutes

Full control and performance

Run most popular small LLMs

Reduce cloud costs up to n%

Up to 50% of queries will be handled locally

Perfect fit for

Messaging & assistants

Productivity & writing tools

Finance, health & compliance apps

On device intelligence layers

Set up your AI project in 10 minutes

Blazing-Fast AI Fully On-Device

With Mirai SDK, creating and deploying on device AI is as easy as picking your framework (Swift or Rust), generating API key, and choosing use case

Try Mirai

Try Mirai

Try Mirai

Contact us

Contact us

Contact us