DeepSeek-R1: Open-Source Reasoning Model Release

web

GitHub·github.com/deepseek-ai/DeepSeek-R1

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: GitHub

DeepSeek-R1's release in early 2025 was a landmark event in open-source AI, prompting significant debate in the AI safety community about the implications of freely available frontier reasoning models and the viability of compute-based governance approaches.

Metadata

Importance: 72/100tool pageprimary source

Summary

DeepSeek-R1 is an open-source large language model from DeepSeek-AI that achieves strong reasoning capabilities through reinforcement learning, reportedly matching or approaching OpenAI's o1 performance on reasoning benchmarks. The release includes model weights, technical details, and distilled smaller variants, representing a significant open-source milestone in frontier reasoning AI. Its release demonstrated that high-capability reasoning models can be developed at lower cost and made openly available.

Key Points

•DeepSeek-R1 uses reinforcement learning to develop chain-of-thought reasoning capabilities without relying solely on supervised fine-tuning
•Released as open-source with model weights, enabling independent research, evaluation, and deployment by the AI safety community
•Reportedly achieves performance competitive with OpenAI o1 on math, coding, and reasoning benchmarks at significantly lower training cost
•Includes distilled smaller models (1.5B to 70B parameters) making capable reasoning models accessible for safety research
•Raised discussions about AI proliferation, geopolitical implications, and the pace of open-source frontier capabilities development

Cited by 2 pages

Page	Type	Quality
Reasoning and Planning	Capability	65.0
AI Proliferation Risk Model	Analysis	65.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202616 KB

GitHub - deepseek-ai/DeepSeek-R1 · GitHub 

 
 
 
 

 
 

 

 

 
 

 
 

 

 

 

 

 

 

 

 

 

 

 
 
 

 
 
 

 

 

 
 
 
 

 

 

 

 

 

 

 
 

 

 

 
 

 
 
 

 
 

 

 

 
 
 
 

 
 Skip to content 

 
 
 
 
 
 

 
 
 
 
 

 

 

 

 
 
 
 
 
 You signed in with another tab or window. Reload to refresh your session. 
 You signed out in another tab or window. Reload to refresh your session. 
 You switched accounts on another tab or window. Reload to refresh your session. 

 
 
 
 Dismiss alert 

 
 
 

 

 

 
 
 
 
 
 
 
 
 
 
 
 {{ message }} 

 
 
 
 
 

 

 
 
 
 
 
 

 

 

 

 

 
 
 
 
 
 
 
 
 
 deepseek-ai
 
 / 
 
 DeepSeek-R1 
 

 Public 
 

 

 
 
 
 

 
 
 
 Notifications
 You must be signed in to change notification settings 

 

 
 
 
 Fork
 11.7k 
 
 

 
 
 
 
 
 Star
 92k 
 
 

 

 
 

 
 

 

 
 

 
 
 

 
 
 

 
 
 
   main Branches Tags Go to file Code Open more actions menu Folders and files

 Name Name Last commit message Last commit date Latest commit

   History

 36 Commits 36 Commits .github/ workflows .github/ workflows     figures figures     DeepSeek_R1.pdf DeepSeek_R1.pdf     LICENSE LICENSE     README.md README.md     View all files Repository files navigation

 DeepSeek-R1

 

 
 
 
 
 
 
 
 
 

 
 
 
 

 
 

 Paper Link 👁️ 
 
 1. Introduction

 
 We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,
we introduce DeepSeek-R1, which incorporates cold-start data before RL.
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

 NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section. 

 
 

 2. Model Summary

 
 
 Post-Training: Large-Scale Reinforcement Learning on the Base Model 

 
 
 We directly apply reinforcement learning (RL) to the base model without relying on supervised fine-tuning (SFT) as a preliminary step. This approach allows the model to explore chain-of-thought (CoT) for solving complex problems, resulting in the development of DeepSeek-R1-Zero. DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the r

... (truncated, 16 KB total)

Resource ID: afad87e802e53736 | Stable ID: Y2VkY2Q0Nj