Llama 3.2 is a collection of multilingual large language models available in 1 billion and 3 billion parameter sizes, developed by Meta as pretrained and instruction-tuned generative models. The instruction-tuned text-only versions are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks, and outperform many available open source and closed chat models on common industry benchmarks. The model uses an optimized transformer architecture with grouped-query attention for improved inference scalability, and was pretrained on up to 9 trillion tokens of publicly available data with a knowledge cutoff of December 2023. Llama 3.2 supports eight languages officially: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, though it has been trained on a broader collection of languages. The 1B and 3B models incorporate knowledge distillation from larger Llama 3.1 models and use supervised fine-tuning, rejection sampling, and direct preference optimization for alignment. Quantized versions are available optimized for mobile and edge deployment through methods like SpinQuant and QLoRA, achieving significant speedups and memory reductions on constrained devices. The models are designed for deployment as part of broader AI systems with additional safety guardrails rather than in isolation, and are intended for both commercial and research applications.
Meta
available local models on Mirai:
available local models on Mirai:
Name
Size
Llama-3.1-8B-Instruct
8B
Quant.
No
Size
8B
Llama-3.2-1B-Instruct
1B
Quant.
No
Size
1B
Llama-3.2-3B-Instruct
3B
Quant.
No
Size
3B
Llama-3.1-8B-Instruct-4bit
8B
Quant.
No
Size
8B
Llama-3.2-1B-Instruct-4bit
1B
Quant.
No
Size
1B
Llama-3.2-1B-Instruct-8bit
1B
Quant.
No
Size
1B
Llama-3.2-3B-Instruct-4bit
3B
Quant.
No
Size
3B
Llama-3.2-3B-Instruct-8bit
3B
Quant.
No
Size
3B
Llama-3.2-3B-Instruct-AWQ
3B
Quant.
No
Size
3B
Llama 3.2 is a collection of multilingual large language models available in 1 billion and 3 billion parameter sizes, developed by Meta as pretrained and instruction-tuned generative models. The instruction-tuned text-only versions are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks, and outperform many available open source and closed chat models on common industry benchmarks. The model uses an optimized transformer architecture with grouped-query attention for improved inference scalability, and was pretrained on up to 9 trillion tokens of publicly available data with a knowledge cutoff of December 2023. Llama 3.2 supports eight languages officially: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, though it has been trained on a broader collection of languages. The 1B and 3B models incorporate knowledge distillation from larger Llama 3.1 models and use supervised fine-tuning, rejection sampling, and direct preference optimization for alignment. Quantized versions are available optimized for mobile and edge deployment through methods like SpinQuant and QLoRA, achieving significant speedups and memory reductions on constrained devices. The models are designed for deployment as part of broader AI systems with additional safety guardrails rather than in isolation, and are intended for both commercial and research applications.
Meta
available local models on Mirai:
Name
Size
Llama-3.1-8B-Instruct
8B
Quant.
No
Size
8B
Llama-3.2-1B-Instruct
1B
Quant.
No
Size
1B
Llama-3.2-3B-Instruct
3B
Quant.
No
Size
3B
Llama-3.1-8B-Instruct-4bit
8B
Quant.
No
Size
8B
Llama-3.2-1B-Instruct-4bit
1B
Quant.
No
Size
1B
Llama-3.2-1B-Instruct-8bit
1B
Quant.
No
Size
1B
Llama-3.2-3B-Instruct-4bit
3B
Quant.
No
Size
3B
Llama-3.2-3B-Instruct-8bit
3B
Quant.
No
Size
3B
Llama-3.2-3B-Instruct-AWQ
3B
Quant.
No
Size
3B