DeepSeek-R1: Open-Source Reasoning Model Release
webCredibility Rating
Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: GitHub
DeepSeek-R1's release in early 2025 was a landmark event in open-source AI, prompting significant debate in the AI safety community about the implications of freely available frontier reasoning models and the viability of compute-based governance approaches.
Metadata
Summary
DeepSeek-R1 is an open-source large language model from DeepSeek-AI that achieves strong reasoning capabilities through reinforcement learning, reportedly matching or approaching OpenAI's o1 performance on reasoning benchmarks. The release includes model weights, technical details, and distilled smaller variants, representing a significant open-source milestone in frontier reasoning AI. Its release demonstrated that high-capability reasoning models can be developed at lower cost and made openly available.
Key Points
- •DeepSeek-R1 uses reinforcement learning to develop chain-of-thought reasoning capabilities without relying solely on supervised fine-tuning
- •Released as open-source with model weights, enabling independent research, evaluation, and deployment by the AI safety community
- •Reportedly achieves performance competitive with OpenAI o1 on math, coding, and reasoning benchmarks at significantly lower training cost
- •Includes distilled smaller models (1.5B to 70B parameters) making capable reasoning models accessible for safety research
- •Raised discussions about AI proliferation, geopolitical implications, and the pace of open-source frontier capabilities development
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Reasoning and Planning | Capability | 65.0 |
| AI Proliferation Risk Model | Analysis | 65.0 |
Cached Content Preview
GitHub - deepseek-ai/DeepSeek-R1 · GitHub
Skip to content
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
deepseek-ai
/
DeepSeek-R1
Public
Notifications
You must be signed in to change notification settings
Fork
11.7k
Star
92k
main Branches Tags Go to file Code Open more actions menu Folders and files
Name Name Last commit message Last commit date Latest commit
History
36 Commits 36 Commits .github/ workflows .github/ workflows figures figures DeepSeek_R1.pdf DeepSeek_R1.pdf LICENSE LICENSE README.md README.md View all files Repository files navigation
DeepSeek-R1
Paper Link 👁️
1. Introduction
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,
we introduce DeepSeek-R1, which incorporates cold-start data before RL.
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.
2. Model Summary
Post-Training: Large-Scale Reinforcement Learning on the Base Model
We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the r
... (truncated, 16 KB total)afad87e802e53736 | Stable ID: Y2VkY2Q0Nj