LFM2.5-1.2B-Thinking

Run locally Apple devices with Mirai

Type

Type

Local

From

From

LiquidAI

Quantisation

Quantisation

No

Precision

Precision

No

Size

Size

1.2B

Source

Source

Hugging Face Logo

LFM2.5-1.2B-Thinking is a compact language model designed for on-device deployment with 1.2 billion parameters. It builds on the LFM2 architecture with extended pre-training up to 28 trillion tokens and large-scale reinforcement learning, achieving best-in-class performance for its size while rivaling much larger models. The model features a 32,768-token context length and supports eight languages including English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish. The model excels at fast edge inference with extremely low memory requirements, running under 1GB of memory and delivering 239 tokens per second on AMD CPUs and 82 tokens per second on mobile NPUs. It has day-one support for multiple inference frameworks including llama.cpp, MLX, and vLLM. LFM2.5-1.2B-Thinking incorporates a hybrid architecture combining double-gated LIV convolution blocks with GQA blocks, making it particularly effective for agentic tasks, data extraction, and retrieval-augmented generation applications.

1
Choose framework
2
Run the following command to install Mirai SDK
SPMhttps://github.com/trymirai/uzu-swift
3
Set Mirai API keyGet API Key
4
Apply code
Loading...

LFM2.5-1.2B-Thinking

Run locally Apple devices with Mirai

Type

Local

From

LiquidAI

Quantisation

No

Precision

float16

Size

1.2B

Source

Hugging Face Logo

LFM2.5-1.2B-Thinking is a compact language model designed for on-device deployment with 1.2 billion parameters. It builds on the LFM2 architecture with extended pre-training up to 28 trillion tokens and large-scale reinforcement learning, achieving best-in-class performance for its size while rivaling much larger models. The model features a 32,768-token context length and supports eight languages including English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish. The model excels at fast edge inference with extremely low memory requirements, running under 1GB of memory and delivering 239 tokens per second on AMD CPUs and 82 tokens per second on mobile NPUs. It has day-one support for multiple inference frameworks including llama.cpp, MLX, and vLLM. LFM2.5-1.2B-Thinking incorporates a hybrid architecture combining double-gated LIV convolution blocks with GQA blocks, making it particularly effective for agentic tasks, data extraction, and retrieval-augmented generation applications.

1
Choose framework
2
Run the following command to install Mirai SDK
SPMhttps://github.com/trymirai/uzu-swift
3
Set Mirai API keyGet API Key
4
Apply code
Loading...