Mirai Careers – Inference Engineer

Product

Models library

MacOS app

Docs

Company

...

Talk to us

Try SDK

Product

Models library

MacOS app

Docs

Company

Talk to us

Try SDK

Worldwide

Full Time

Worldwide

Full Time

Inference Engineer

Join a small, senior team building the fastest on-device AI inference engine. Powering real products, not demos

Join a small, senior team building the fastest on-device AI inference engine. Powering real products, not demos.

Apply

Supported by angels and founders from

About us

Mirai is building the on-device inference layer for AI.

We enable model makers and product developers to run AI models directly on edge devices, starting with Apple Silicon and expanding to Android and beyond.

Our stack spans from low-level GPU kernels to high-level model conversion tools.

We're a small team obsessed with performance, working at the intersection of systems programming and machine learning research.

The role

We're looking for engineers who can bridge the gap between ML research and high-performance inference.

You'll work across our inference engine and model conversion toolkit, implementing new model architectures, supporting new modalities, writing optimized kernels, and building a wide range of features such as function calling and batch decoding.

This role is ideal for someone who reads papers for fun, enjoys writing high-performance code, and gets excited about constant learning.

Nobody knows everything. We'd rather you know one area deeply than everything superficially. If you're good at least in a couple of these areas, you're a great fit:

JAX / Equinox / Pallas stack
Rust systems programming with a focus on developer experience
Writing Metal / Vulkan kernels
Neural codecs and voice model architectures
Trellis-based quantization approaches
Advanced speculative decoding methods, such as EAGLE
Deep understanding of Transformer / SSM / Diffusion / Vision language models
Benchmarking inference performance and model quality
Strong linear algebra, optimization methods, and probability theory

And of course, basic engineering skills, we will ship a lot of code 🙃

We welcome applications from students and early-career engineers. If you've participated in projects that demonstrate systems thinking and ML understanding, we want to hear from you!

Why us?

Founded by proven entrepreneurs who built and scaled consumer AI leaders like Reface (200M+ users) and Prisma (100M+ users).

Our team is small (12 people), senior, and deeply technical. We ship fast and own problems end-to-end.

We’re advised by a former Apple Distinguished Engineer who worked on MLX, and backed by leading AI-focused funds and individuals.

About us

Mirai is building the on-device inference layer for AI.

We enable model makers and product developers to run AI models directly on edge devices, starting with Apple Silicon and expanding to Android and beyond.

Our stack spans from low-level GPU kernels to high-level model conversion tools.

We're a small team obsessed with performance, working at the intersection of systems programming and machine learning research.

The role

We're looking for engineers who can bridge the gap between ML research and high-performance inference.

This role is ideal for someone who reads papers for fun, enjoys writing high-performance code, and gets excited about constant learning.

Nobody knows everything. We'd rather you know one area deeply than everything superficially. If you're good at least in a couple of these areas, you're a great fit:

JAX / Equinox / Pallas stack
Rust systems programming with a focus on developer experience
Writing Metal / Vulkan kernels
Neural codecs and voice model architectures
Trellis-based quantization approaches
Advanced speculative decoding methods, such as EAGLE
Deep understanding of Transformer / SSM / Diffusion / Vision language models
Benchmarking inference performance and model quality
Strong linear algebra, optimization methods, and probability theory

And of course, basic engineering skills, we will ship a lot of code 🙃

We welcome applications from students and early-career engineers. If you've participated in projects that demonstrate systems thinking and ML understanding, we want to hear from you!

Why us?

Founded by proven entrepreneurs who built and scaled consumer AI leaders like Reface (200M+ users) and Prisma (100M+ users).

Our team is small (12 people), senior, and deeply technical. We ship fast and own problems end-to-end.

We’re advised by a former Apple Distinguished Engineer who worked on MLX, and backed by leading AI-focused funds and individuals.

Why join us?

Impactful Work

You’ll work on core infrastructure that directly shapes how AI runs on billions of devices. Not demos, not prototypes, but production systems.

Career Growth

You’ll take ownership of complex, low-level systems early, and grow alongside a team that has already shipped and scaled AI products.

Collaborative Team

We’re a small, highly collaborative team. No silos, no layers. Just smart people solving hard problems together.

Technology

You’ll work on model optimization, inference runtimes, deployment tooling, and performance-critical systems. Setting new standards for on-device AI.

Interested?

Join a small, senior team building the fastest on-device AI inference engine. Powering real products, not demos

Apply

Interested?

Join a small, senior team building the fastest on-device AI inference engine. Powering real products, not demos.

Apply

Main

Apple Inference SDK

Smart Routing

Models Library

MacOS App

Blog

Docs

Company

About us

Careers

Links

X (Twitter)

Github

Discord

Main

Apple Inference SDK

Smart Routing

Models Library

MacOS App

Blog

Docs

Company

About us

Careers

Links

X (Twitter)

Github

Discord

Main

Apple Inference SDK

Smart Routing

Models Library

MacOS App

Blog

Docs

Company

About us

Careers

Links

X (Twitter)

Github

Discord

Talk to us