Platform

Model library

Our apps

Use cases

Pricing

Docs

Blog

Company

Routing & Speculation for on device AI

Decide what runs local vs cloud. Automatically

Blazing-Fast AI Fully On-Device

Mirai’s routing engine gives you full control over performance, privacy, and price, with speculative decoding built in.

Try Mirai

Try Mirai

Try Mirai

Contact us

Contact us

Contact us

Dynamic Runtime Routing

Optimized for battery & memory

Speculative decoding built-in

On device or cloud. In real time

On-device when it’s fast and private. Cloud when it’s heavy and contextual. All handled through our routing engine — with no overhead on your side.

On-device when it’s fast and private. Cloud when it’s heavy and contextual. All handled through our routing engine — with no overhead on your side.

Dynamic runtime routing

Dynamic runtime routing

Automatically route inference to device or cloud based on prompt type, latency constraints, or user context — no manual ops required.

Schema-aware prompt parsing

Schema-aware prompt parsing

Understand what the task is before you run it. Structured prompts enable intelligent, context-based routing and fallback logic.

Speculative decoding built-in

Speculative decoding built-in

Accelerate responses with medusa-style prediction, nn-gram trees, and prefix caching — without changing your model.

Fully programmable policies

Fully programmable policies

Route by prompt length, device capabilities, confidence thresholds, or user segments. You define the logic — Mirai handles execution.

Optimised for

Speed

Speed

when low-latency matters

Privacy

Privacy

for sensitive data

Accuracy

Accuracy

when longer context is needed

Cost-efficiency

Cost-efficiency

to run less in the cloud

Task

Route

Why Mirai Wins

Task

Route

Why Mirai Wins

Welcome message generation

📱 On device (Llama 3.2 1B)

Sub-400ms response time

Welcome message generation

📱 On device (Llama 3.2 1B)

Sub-400ms response time

Content moderation

📱 On-device or Hybrid

Custom safety tuning

Content moderation

📱 On-device or Hybrid

Custom safety tuning

Code refactor request

☁️ Cloud (GPT-4 or Claude)

Requires long context

Code refactor request

☁️ Cloud (GPT-4 or Claude)

Requires long context

Inline classification

📱 Local + speculative

3x faster than baseline run

Inline classification

📱 Local + speculative

3x faster than baseline run

Welcome message generation

📱 On device (Llama 3.2 1B)

Route

Sub-400ms response time

Why Mirai Wins

Content moderation

📱 On-device or Hybrid

Route

Custom safety tuning

Why Mirai Wins

Code refactor request

☁️ Cloud (GPT-4 or Claude)

Route

Requires long context

Why Mirai Wins

Inline classification

📱 Local + speculative

Route

3x faster than baseline run

Why Mirai Wins

Speculative decoding built-in

Start generating tokens before your model even finishes thinking.

Reduce perceived latency by up to 50%

Speculative decoding built-in

Start generating tokens before your model even finishes thinking.

Reduce perceived latency by up to 50%

Prefill model predictions

Draft KV-caches

Structured output hints

Configure routing without changing your app logic

Specify hard rules (always use cloud, never use cloud), prompt templates with routing hints, model families per task type and more…

Routing & Speculation is currently available in our SDK

Set up your AI project in 10 minutes for free

Blazing-Fast AI Fully On-Device

Try Mirai

Try Mirai

Try Mirai

Contact us

Contact us

Contact us

Full control and performance

First 10K devices for free

Run most popular small LLMs