DeepSeek-R1 is a first-generation reasoning model developed through large-scale reinforcement learning. The model family includes DeepSeek-R1-Zero, which was trained via RL without supervised fine-tuning as a preliminary step and demonstrated remarkable reasoning capabilities, and DeepSeek-R1, which incorporates cold-start data before RL to address challenges such as endless repetition, poor readability, and language mixing. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. The key innovation is demonstrating that reasoning capabilities can be incentivized purely through reinforcement learning without requiring supervised fine-tuning, marking a significant milestone in LLM research. DeepSeek-R1-Zero naturally emerged with powerful reasoning behaviors including self-verification, reflection, and the generation of long chains of thought. To support the research community, DeepSeek has open-sourced both DeepSeek-R1-Zero and DeepSeek-R1, along with six distilled dense models based on Llama and Qwen architectures. Additionally, DeepSeek demonstrates that reasoning patterns from larger models can be effectively distilled into smaller models, resulting in superior performance compared to reasoning patterns discovered through RL on small models alone. The distilled models, ranging from 1.5B to 70B parameters, show exceptional performance on benchmarks, with DeepSeek-R1-Distill-Qwen-32B outperforming OpenAI-o1-mini across various benchmarks and achieving state-of-the-art results for dense models.
DeepSeek-R1 is a first-generation reasoning model developed through large-scale reinforcement learning. The model family includes DeepSeek-R1-Zero, which was trained via RL without supervised fine-tuning as a preliminary step and demonstrated remarkable reasoning capabilities, and DeepSeek-R1, which incorporates cold-start data before RL to address challenges such as endless repetition, poor readability, and language mixing. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. The key innovation is demonstrating that reasoning capabilities can be incentivized purely through reinforcement learning without requiring supervised fine-tuning, marking a significant milestone in LLM research. DeepSeek-R1-Zero naturally emerged with powerful reasoning behaviors including self-verification, reflection, and the generation of long chains of thought. To support the research community, DeepSeek has open-sourced both DeepSeek-R1-Zero and DeepSeek-R1, along with six distilled dense models based on Llama and Qwen architectures. Additionally, DeepSeek demonstrates that reasoning patterns from larger models can be effectively distilled into smaller models, resulting in superior performance compared to reasoning patterns discovered through RL on small models alone. The distilled models, ranging from 1.5B to 70B parameters, show exceptional performance on benchmarks, with DeepSeek-R1-Distill-Qwen-32B outperforming OpenAI-o1-mini across various benchmarks and achieving state-of-the-art results for dense models.