LFM2.5-1.2B-Thinking is a compact language model designed for on-device deployment with 1.2 billion parameters. It builds on the LFM2 architecture with extended pre-training up to 28 trillion tokens and large-scale reinforcement learning, achieving best-in-class performance for its size while rivaling much larger models. The model features a 32,768-token context length and supports eight languages including English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish. The model excels at fast edge inference with extremely low memory requirements, running under 1GB of memory and delivering 239 tokens per second on AMD CPUs and 82 tokens per second on mobile NPUs. It has day-one support for multiple inference frameworks including llama.cpp, MLX, and vLLM. LFM2.5-1.2B-Thinking incorporates a hybrid architecture combining double-gated LIV convolution blocks with GQA blocks, making it particularly effective for agentic tasks, data extraction, and retrieval-augmented generation applications.
LiquidAI
available local models on Mirai:
available local models on Mirai:
Name
Quantisation
Size
LFM2-1.2B
No
1.2B
Quant.
No
Size
1.2B
LFM2-2.6B
No
2.6B
Quant.
No
Size
2.6B
LFM2-350M
No
350M
Quant.
No
Size
350M
LFM2-700M
No
700M
Quant.
No
Size
700M
LFM2.5-1.2B-Instruct
No
1.2B
Quant.
No
Size
1.2B
LFM2.5-1.2B-Instruct-MLX-4bit
No
1.2B
Quant.
No
Size
1.2B
LFM2.5-1.2B-Instruct-MLX-8bit
No
1.2B
Quant.
No
Size
1.2B
LFM2.5-1.2B-Thinking
No
1.2B
Quant.
No
Size
1.2B
LFM2-1.2B-4bit
No
1.2B
Quant.
No
Size
1.2B
LFM2-1.2B-8bit
No
1.2B
Quant.
No
Size
1.2B
LFM2-2.6B-4bit
No
2.6B
Quant.
No
Size
2.6B
LFM2-2.6B-8bit
No
2.6B
Quant.
No
Size
2.6B
LFM2-350M-4bit
No
350M
Quant.
No
Size
350M
LFM2-350M-8bit
No
350M
Quant.
No
Size
350M
LFM2-700M-4bit
No
700M
Quant.
No
Size
700M
LFM2-700M-8bit
No
700M
Quant.
No
Size
700M
LFM2.5-1.2B-Thinking-4bit
No
1.2B
Quant.
No
Size
1.2B
LFM2.5-1.2B-Thinking-8bit
No
1.2B
Quant.
No
Size
1.2B
LFM2.5-1.2B-Thinking is a compact language model designed for on-device deployment with 1.2 billion parameters. It builds on the LFM2 architecture with extended pre-training up to 28 trillion tokens and large-scale reinforcement learning, achieving best-in-class performance for its size while rivaling much larger models. The model features a 32,768-token context length and supports eight languages including English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish. The model excels at fast edge inference with extremely low memory requirements, running under 1GB of memory and delivering 239 tokens per second on AMD CPUs and 82 tokens per second on mobile NPUs. It has day-one support for multiple inference frameworks including llama.cpp, MLX, and vLLM. LFM2.5-1.2B-Thinking incorporates a hybrid architecture combining double-gated LIV convolution blocks with GQA blocks, making it particularly effective for agentic tasks, data extraction, and retrieval-augmented generation applications.
LiquidAI
available local models on Mirai:
Name
Quantisation
Size
LFM2-1.2B
No
1.2B
Quant.
No
Size
1.2B
LFM2-2.6B
No
2.6B
Quant.
No
Size
2.6B
LFM2-350M
No
350M
Quant.
No
Size
350M
LFM2-700M
No
700M
Quant.
No
Size
700M
LFM2.5-1.2B-Instruct
No
1.2B
Quant.
No
Size
1.2B
LFM2.5-1.2B-Instruct-MLX-4bit
No
1.2B
Quant.
No
Size
1.2B
LFM2.5-1.2B-Instruct-MLX-8bit
No
1.2B
Quant.
No
Size
1.2B
LFM2.5-1.2B-Thinking
No
1.2B
Quant.
No
Size
1.2B
LFM2-1.2B-4bit
No
1.2B
Quant.
No
Size
1.2B
LFM2-1.2B-8bit
No
1.2B
Quant.
No
Size
1.2B
LFM2-2.6B-4bit
No
2.6B
Quant.
No
Size
2.6B
LFM2-2.6B-8bit
No
2.6B
Quant.
No
Size
2.6B
LFM2-350M-4bit
No
350M
Quant.
No
Size
350M
LFM2-350M-8bit
No
350M
Quant.
No
Size
350M
LFM2-700M-4bit
No
700M
Quant.
No
Size
700M
LFM2-700M-8bit
No
700M
Quant.
No
Size
700M
LFM2.5-1.2B-Thinking-4bit
No
1.2B
Quant.
No
Size
1.2B
LFM2.5-1.2B-Thinking-8bit
No
1.2B
Quant.
No
Size
1.2B