Part 2: How to Understand On-Device AI

By

Mirai team

Mar 23, 2025

What is the first thing that comes to your mind when you hear about AI? The future of technology, new economics, AGI, chatbots, agent pipelines, research, infrastructure costs, and so on? For the next five minutes, we want you to try thinking about it as just a universal feature.

All of us had our own "wow" moment the first time we interacted with LLMs. Never before had it been so effortless to get the result you needed using nothing more than natural language - no prerequisites, no special knowledge, just simple text instructions. But this is the most basic use case you can imagine, as it requires direct user intent to produce a result.

Open any app on your phone. You don’t think about how your Facebook feed or Spotify playlist is personalized. You don’t think about how your camera doesn’t just give you raw data but instead processes it digitally to produce a refined photo. You don’t ask for it - it just happens, giving you better results than before, exactly when you need them. And this is where the true power of AI lies. There are unlimited use cases across almost every domain and every app that can be solved more effectively using AI.

So why aren’t we there yet?
Why isn’t every feature AI-backed?

The answer is straightforward: until recently, it was simply too difficult and expensive. You had to build complex pipelines, design and train specialized models, and hire experts with rare skill sets. Developing a specific AI solution could take years, and only large companies could afford it. This is exactly what LLMs have changed. We now have more "general" models capable of effectively solving a wide range of tasks, with a much more intuitive way to "program" their behavior.

But you don’t have to build the core of your product around it. You don’t need to craft complex prompts to perfectly solve the main task your product is designed for in order to leverage LLM effectiveness. You can simply make small features work better - and ship them much faster.

Let's create a simple app together

Start with something straightforward. For example, you decide to create yet another email app. You finish the first version and start thinking about how AI can make it better. Here’s a pretty obvious list of features:

  • Show thread summaries

  • Extract travel details

  • Extract calendar events

  • Check grammar

  • Detect sensitive content

  • Content moderation

  • Prioritize attachments for preloading

None of these features are the core of your product. None of them will dramatically change your app's metrics. But together, each one contributes a small improvement at different stages of the user experience.

The result? A compound effect - even if users can’t pinpoint exactly why your app feels better, they'll sense that, in some invisible way, it’s simply superior compared to others. In this context, AI is a universal feature. At its core, each of these improvements comes down to calling a single AI function to get results. But it’s not that simple.

Let's start with the "check grammar" feature from a developer’s perspective. If you paste your text into ChatGPT and ask it to check for grammar mistakes, the response will likely be something like:  "It looks like you meant to say, ...". If you use an API, you'll get a similar unstructured response.

This isn’t the type of response your app can directly interpret. You need to show exactly what’s wrong in the UI. That means identifying the specific chunk of text, its location, highlighting errors, providing corrections, and more. In other words, you need the model’s output to be structured.

At this point, a developer starts modifying prompts to achieve the desired behaviour. Depending on the complexity of the structure, they might get an acceptable accuracy rate - but some mistakes will be inevitable, and in certain cases, errors will just have to be treated as failures.

You can go further, making the pipeline more complex. You might:

  • Use self-hosted models

  • Leverage structured output features from modern inference engines like llama.cpp

  • Experiment with different models to find the best one

So, what’s the problem? Grammar checking is something that needs to run every few seconds as the user types. If you rely on an AI API, costs will skyrocket quickly. If you run your own AI infrastructure, you're still paying - just in a different way (complexity always comes with invisible costs.). Sure, this problem has been solved before - just use a specialized grammar-checking service, right? You can, but that adds another third-party dependency to your app. Or you could wait for OS-level support.

And these problems exists for every AI-powered feature. If you integrate all the AI improvements from the list, your app will be calling an AI service every 10-20 seconds. Multiply that by your user base, and try calculating the economics of your app.

But cost isn't the only issue:

  • You don’t want to send sensitive user content to third-party providers.

  • You don’t want half of your app’s features to break if your AI provider has an outage.

  • You don’t want to spend hours debugging issues that arise because your once-simple email client has now turned into a complex distributed system with multiple backends.

  • Every subsystem involved in executing a task will, at some point, become a point of failure. Your app becomes less reliable.

Want more? AI providers will eventually update their models. The prompts that worked perfectly with the previous version might perform completely differently on a new one.

So, nothing changed?
Was complex, still complex.

Not really. Let's summarize the key takeaways into the following set of theses:

  • You don’t need the most modern, capable, or intelligent model for every task - some tasks can be successfully handled by small models

  • Small models can run efficiently directly on the user's device (and most big tech companies continue to release them)

  • You don’t need one model for all tasks: instead, you need a way to properly route requests to the right model

  • You don’t need raw text from the model - you need structured data that your app can process

  • AI usage shouldn’t always require direct user initiation - in most cases, it should be part of the app’s internal logic

  • The main consumer of AI-generated results is your app itself, using them to build business logic

  • Relying on remote models comes with issues related to pricing, privacy, and reliability - which you want to avoid

On-device AI solutions already exist, but their maturity lags behind cloud-based solutions by several years. Dragging and dropping an SDK isn’t enough - you have to be prepared for:

  • Choosing the right model yourself

  • Selecting an inference engine

  • Building from source using complex build systems

  • Converting models into specific formats with limited documentation

  • Understanding how to interact with the model

  • Diagnosing slow inference speeds

  • Writing boilerplate code for model loading, storage, and memory management

  • Parsing and structuring model outputs to fit your app’s needs

  • Repeating this process multiple times because the engine you initially selected might not support the model you want to use

So what is the ideal scenario?

  • You start building a new app.

  • You integrate a default on-device AI SDK that handles everything you need - just like you do today with Analytics and Crashlytics.

  • You call simple methods for specific tasks and receive structured, reliable results tailored to your app’s needs.

This is exactly what we're building at mirai. We’re creating an ecosystem of tools that enable teams without AI expertise to integrate AI-powered features into their apps - using simple, predefined functions while ensuring 100% local execution and superior performance. This level of efficiency is achieved through deep hardware architecture understanding and leveraging prior knowledge of specific use cases - which we’ll explore in-depth in the next articles.

Use cases are everywhere

If you still have doubts about whether your app can benefit from on-device AI, here are some tasks you can implement:

  • Summarization (catch up on what you missed in a chat, summarize your card transactions)

  • Feature extraction (label content with categories, such as sentiment analysis for tweets)

  • Document data extraction

  • Content moderation

  • Game logic

  • Accessibility features (image captioning)

  • Dynamic UI layout

  • Determining the best moment to show a subscription screen

  • AR annotations

  • Translation

  • Content annotations

  • Content suggestions

TL;DR

  • Every app has the potential to leverage the advantages of AI

  • Don’t think of on-device AI as something expensive or complex - it’s just a universal feature you can use almost anywhere

  • If you have a hassle-free solution that you can simply drop in and try, there’s absolutely no reason not to 😊! You can check out a preview today: https://github.com/getmirai/sdk-ios

What is the first thing that comes to your mind when you hear about AI? The future of technology, new economics, AGI, chatbots, agent pipelines, research, infrastructure costs, and so on? For the next five minutes, we want you to try thinking about it as just a universal feature.

All of us had our own "wow" moment the first time we interacted with LLMs. Never before had it been so effortless to get the result you needed using nothing more than natural language - no prerequisites, no special knowledge, just simple text instructions. But this is the most basic use case you can imagine, as it requires direct user intent to produce a result.

Open any app on your phone. You don’t think about how your Facebook feed or Spotify playlist is personalized. You don’t think about how your camera doesn’t just give you raw data but instead processes it digitally to produce a refined photo. You don’t ask for it - it just happens, giving you better results than before, exactly when you need them. And this is where the true power of AI lies. There are unlimited use cases across almost every domain and every app that can be solved more effectively using AI.

So why aren’t we there yet?
Why isn’t every feature AI-backed?

The answer is straightforward: until recently, it was simply too difficult and expensive. You had to build complex pipelines, design and train specialized models, and hire experts with rare skill sets. Developing a specific AI solution could take years, and only large companies could afford it. This is exactly what LLMs have changed. We now have more "general" models capable of effectively solving a wide range of tasks, with a much more intuitive way to "program" their behavior.

But you don’t have to build the core of your product around it. You don’t need to craft complex prompts to perfectly solve the main task your product is designed for in order to leverage LLM effectiveness. You can simply make small features work better - and ship them much faster.

Let's create a simple app together

Start with something straightforward. For example, you decide to create yet another email app. You finish the first version and start thinking about how AI can make it better. Here’s a pretty obvious list of features:

  • Show thread summaries

  • Extract travel details

  • Extract calendar events

  • Check grammar

  • Detect sensitive content

  • Content moderation

  • Prioritize attachments for preloading

None of these features are the core of your product. None of them will dramatically change your app's metrics. But together, each one contributes a small improvement at different stages of the user experience.

The result? A compound effect - even if users can’t pinpoint exactly why your app feels better, they'll sense that, in some invisible way, it’s simply superior compared to others. In this context, AI is a universal feature. At its core, each of these improvements comes down to calling a single AI function to get results. But it’s not that simple.

Let's start with the "check grammar" feature from a developer’s perspective. If you paste your text into ChatGPT and ask it to check for grammar mistakes, the response will likely be something like:  "It looks like you meant to say, ...". If you use an API, you'll get a similar unstructured response.

This isn’t the type of response your app can directly interpret. You need to show exactly what’s wrong in the UI. That means identifying the specific chunk of text, its location, highlighting errors, providing corrections, and more. In other words, you need the model’s output to be structured.

At this point, a developer starts modifying prompts to achieve the desired behaviour. Depending on the complexity of the structure, they might get an acceptable accuracy rate - but some mistakes will be inevitable, and in certain cases, errors will just have to be treated as failures.

You can go further, making the pipeline more complex. You might:

  • Use self-hosted models

  • Leverage structured output features from modern inference engines like llama.cpp

  • Experiment with different models to find the best one

So, what’s the problem? Grammar checking is something that needs to run every few seconds as the user types. If you rely on an AI API, costs will skyrocket quickly. If you run your own AI infrastructure, you're still paying - just in a different way (complexity always comes with invisible costs.). Sure, this problem has been solved before - just use a specialized grammar-checking service, right? You can, but that adds another third-party dependency to your app. Or you could wait for OS-level support.

And these problems exists for every AI-powered feature. If you integrate all the AI improvements from the list, your app will be calling an AI service every 10-20 seconds. Multiply that by your user base, and try calculating the economics of your app.

But cost isn't the only issue:

  • You don’t want to send sensitive user content to third-party providers.

  • You don’t want half of your app’s features to break if your AI provider has an outage.

  • You don’t want to spend hours debugging issues that arise because your once-simple email client has now turned into a complex distributed system with multiple backends.

  • Every subsystem involved in executing a task will, at some point, become a point of failure. Your app becomes less reliable.

Want more? AI providers will eventually update their models. The prompts that worked perfectly with the previous version might perform completely differently on a new one.

So, nothing changed?
Was complex, still complex.

Not really. Let's summarize the key takeaways into the following set of theses:

  • You don’t need the most modern, capable, or intelligent model for every task - some tasks can be successfully handled by small models

  • Small models can run efficiently directly on the user's device (and most big tech companies continue to release them)

  • You don’t need one model for all tasks: instead, you need a way to properly route requests to the right model

  • You don’t need raw text from the model - you need structured data that your app can process

  • AI usage shouldn’t always require direct user initiation - in most cases, it should be part of the app’s internal logic

  • The main consumer of AI-generated results is your app itself, using them to build business logic

  • Relying on remote models comes with issues related to pricing, privacy, and reliability - which you want to avoid

On-device AI solutions already exist, but their maturity lags behind cloud-based solutions by several years. Dragging and dropping an SDK isn’t enough - you have to be prepared for:

  • Choosing the right model yourself

  • Selecting an inference engine

  • Building from source using complex build systems

  • Converting models into specific formats with limited documentation

  • Understanding how to interact with the model

  • Diagnosing slow inference speeds

  • Writing boilerplate code for model loading, storage, and memory management

  • Parsing and structuring model outputs to fit your app’s needs

  • Repeating this process multiple times because the engine you initially selected might not support the model you want to use

So what is the ideal scenario?

  • You start building a new app.

  • You integrate a default on-device AI SDK that handles everything you need - just like you do today with Analytics and Crashlytics.

  • You call simple methods for specific tasks and receive structured, reliable results tailored to your app’s needs.

This is exactly what we're building at mirai. We’re creating an ecosystem of tools that enable teams without AI expertise to integrate AI-powered features into their apps - using simple, predefined functions while ensuring 100% local execution and superior performance. This level of efficiency is achieved through deep hardware architecture understanding and leveraging prior knowledge of specific use cases - which we’ll explore in-depth in the next articles.

Use cases are everywhere

If you still have doubts about whether your app can benefit from on-device AI, here are some tasks you can implement:

  • Summarization (catch up on what you missed in a chat, summarize your card transactions)

  • Feature extraction (label content with categories, such as sentiment analysis for tweets)

  • Document data extraction

  • Content moderation

  • Game logic

  • Accessibility features (image captioning)

  • Dynamic UI layout

  • Determining the best moment to show a subscription screen

  • AR annotations

  • Translation

  • Content annotations

  • Content suggestions

TL;DR

  • Every app has the potential to leverage the advantages of AI

  • Don’t think of on-device AI as something expensive or complex - it’s just a universal feature you can use almost anywhere

  • If you have a hassle-free solution that you can simply drop in and try, there’s absolutely no reason not to 😊! You can check out a preview today: https://github.com/getmirai/sdk-ios

Try Mirai – AI which run directly on your devices, bringing powerful capabilities closer to where decisions are made.

Hassle-free app integration, lightning-fast inference, reliable structured outputs