The Llama 3.2 collection of multilingual large language models is a set of pretrained and instruction-tuned generative models available in 1B and 3B sizes designed for text input and output. The instruction-tuned text-only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks, and outperform many available open source and closed chat models on common industry benchmarks. Llama 3.2 is an auto-regressive language model using an optimized transformer architecture, with the tuned versions employing supervised fine-tuning and reinforcement learning with human feedback to align with human preferences for helpfulness and safety. The models support eight officially supported languages including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, and were trained on up to 9 trillion tokens from publicly available sources with a knowledge cutoff of December 2023. All model versions use Grouped-Query Attention for improved inference scalability, and quantized variants are available for deployment in constrained environments like mobile devices.
Meta
available local models on Mirai:
available local models on Mirai:
Name
Quantisation
Size
Llama-3.1-8B-Instruct
No
8B
Quant.
No
Size
8B
Llama-3.2-1B-Instruct
No
1B
Quant.
No
Size
1B
Llama-3.2-3B-Instruct
No
3B
Quant.
No
Size
3B
Llama-3.1-8B-Instruct-4bit
No
8B
Quant.
No
Size
8B
Llama-3.2-1B-Instruct-4bit
No
1B
Quant.
No
Size
1B
Llama-3.2-1B-Instruct-8bit
No
1B
Quant.
No
Size
1B
Llama-3.2-3B-Instruct-4bit
No
3B
Quant.
No
Size
3B
Llama-3.2-3B-Instruct-8bit
No
3B
Quant.
No
Size
3B
Llama-3.2-3B-Instruct-AWQ
No
3B
Quant.
No
Size
3B
The Llama 3.2 collection of multilingual large language models is a set of pretrained and instruction-tuned generative models available in 1B and 3B sizes designed for text input and output. The instruction-tuned text-only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks, and outperform many available open source and closed chat models on common industry benchmarks. Llama 3.2 is an auto-regressive language model using an optimized transformer architecture, with the tuned versions employing supervised fine-tuning and reinforcement learning with human feedback to align with human preferences for helpfulness and safety. The models support eight officially supported languages including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, and were trained on up to 9 trillion tokens from publicly available sources with a knowledge cutoff of December 2023. All model versions use Grouped-Query Attention for improved inference scalability, and quantized variants are available for deployment in constrained environments like mobile devices.
Meta
available local models on Mirai:
Name
Quantisation
Size
Llama-3.1-8B-Instruct
No
8B
Quant.
No
Size
8B
Llama-3.2-1B-Instruct
No
1B
Quant.
No
Size
1B
Llama-3.2-3B-Instruct
No
3B
Quant.
No
Size
3B
Llama-3.1-8B-Instruct-4bit
No
8B
Quant.
No
Size
8B
Llama-3.2-1B-Instruct-4bit
No
1B
Quant.
No
Size
1B
Llama-3.2-1B-Instruct-8bit
No
1B
Quant.
No
Size
1B
Llama-3.2-3B-Instruct-4bit
No
3B
Quant.
No
Size
3B
Llama-3.2-3B-Instruct-8bit
No
3B
Quant.
No
Size
3B
Llama-3.2-3B-Instruct-AWQ
No
3B
Quant.
No
Size
3B