Type
Type
Local
From
From
HuggingFace
Quantisation
Quantisation
No
Precision
Precision
No
Size
Size
1.7B
SmolLM2 is a family of compact language models available in three sizes: 135M, 360M, and 1.7B parameters. They are designed to solve a wide range of tasks while remaining lightweight enough to run on-device. The 1.7B variant represents a significant advance over its predecessor SmolLM1-1.7B, particularly in instruction following, knowledge, reasoning, and mathematics. It was trained on 11 trillion tokens using a diverse dataset combination including FineWeb-Edu, DCLM, The Stack, along with curated mathematics and coding datasets. The instruct version was developed through supervised fine-tuning using public and curated datasets, followed by Direct Preference Optimization using UltraFeedback. The instruct model additionally supports text rewriting, summarization, and function calling through datasets developed by Argilla. SmolLM2 models primarily understand and generate content in English and should be used as assistive tools rather than definitive sources of information, as generated content may not always be factually accurate or free from biases present in the training data.
SmolLM2 is a family of compact language models available in three sizes: 135M, 360M, and 1.7B parameters. They are designed to solve a wide range of tasks while remaining lightweight enough to run on-device. The 1.7B variant represents a significant advance over its predecessor SmolLM1-1.7B, particularly in instruction following, knowledge, reasoning, and mathematics. It was trained on 11 trillion tokens using a diverse dataset combination including FineWeb-Edu, DCLM, The Stack, along with curated mathematics and coding datasets. The instruct version was developed through supervised fine-tuning using public and curated datasets, followed by Direct Preference Optimization using UltraFeedback. The instruct model additionally supports text rewriting, summarization, and function calling through datasets developed by Argilla. SmolLM2 models primarily understand and generate content in English and should be used as assistive tools rather than definitive sources of information, as generated content may not always be factually accurate or free from biases present in the training data.