Skip to content
Longterm Wiki
Back

AI timelines and capabilities

paper

Authors

DeepSeek-AI·:·Xiao Bi·Deli Chen·Guanting Chen·Shanhuang Chen·Damai Dai·Chengqi Deng·Honghui Ding·Kai Dong·Qiushi Du·Zhe Fu·Huazuo Gao·Kaige Gao·Wenjun Gao·Ruiqi Ge·Kang Guan·Daya Guo·Jianzhong Guo·Guangbo Hao·Zhewen Hao·Ying He·Wenjie Hu·Panpan Huang·Erhang Li·Guowei Li·Jiashi Li·Yao Li·Y. K. Li·Wenfeng Liang·Fangyun Lin·A. X. Liu·Bo Liu·Wen Liu·Xiaodong Liu·Xin Liu·Yiyuan Liu·Haoyu Lu·Shanghao Lu·Fuli Luo·Shirong Ma·Xiaotao Nie·Tian Pei·Yishi Piao·Junjie Qiu·Hui Qu·Tongzheng Ren·Zehui Ren·Chong Ruan·Zhangli Sha·Zhihong Shao·Junxiao Song·Xuecheng Su·Jingxiang Sun·Yaofeng Sun·Minghui Tang·Bingxuan Wang·Peiyi Wang·Shiyu Wang·Yaohui Wang·Yongji Wang·Tong Wu·Y. Wu·Xin Xie·Zhenda Xie·Ziwei Xie·Yiliang Xiong·Hanwei Xu·R. X. Xu·Yanhong Xu·Dejian Yang·Yuxiang You·Shuiping Yu·Xingkai Yu·B. Zhang·Haowei Zhang·Lecong Zhang·Liyue Zhang·Mingchuan Zhang·Minghua Zhang·Wentao Zhang·Yichao Zhang·Chenggang Zhao·Yao Zhao·Shangyan Zhou·Shunfeng Zhou·Qihao Zhu·Yuheng Zou

Credibility Rating

3/5
Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: arXiv

Technical paper on scaling laws for large language models and introduction of DeepSeek LLM, relevant to understanding AI capabilities development and timelines for advanced AI systems.

Paper Details

Citations
700
66 influential
Year
2024

Metadata

arxiv preprintprimary source

Abstract

The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.

Summary

This paper presents DeepSeek LLM, an open-source large language model project that addresses inconsistencies in scaling law literature by providing empirical findings for scaling models at 7B and 67B parameters. The authors developed a 2 trillion token dataset and applied supervised fine-tuning and Direct Preference Optimization to create DeepSeek Chat models. Their evaluation demonstrates that DeepSeek LLM 67B outperforms LLaMA-2 70B across multiple benchmarks, particularly in code, mathematics, and reasoning tasks, with the chat variant showing competitive performance against GPT-3.5.

Cited by 1 page

PageTypeQuality
AI Risk Activation Timeline ModelAnalysis66.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202698 KB
[2401.02954] DeepSeek LLM Scaling Open-Source Language Models with Longtermism 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 
 
 
 
 
 
 \reportnumber 
 001 

 \correspondingauthor Authors are ordered alphabetically by the last name.

 

 
 DeepSeek LLM 
 Scaling Open-Source Language Models with Longtermism

 
 
 
Xiao Bi
 
 DeepSeek-AI
 
 
 Deli Chen
 
 DeepSeek-AI
 
 
 Guanting Chen
 
 DeepSeek-AI
 
 
 Shanhuang Chen
 
 DeepSeek-AI
 
 
 Damai Dai
 
 DeepSeek-AI
 
 
 Chengqi Deng
 
 DeepSeek-AI
 
 
 
 Honghui Ding
 
 DeepSeek-AI
 
 
 Kai Dong
 
 DeepSeek-AI
 
 
 Qiushi Du
 
 DeepSeek-AI
 
 
 Zhe Fu
 
 DeepSeek-AI
 
 
 Huazuo Gao
 
 DeepSeek-AI
 
 
 Kaige Gao
 
 DeepSeek-AI
 
 
 Wenjun Gao
 
 DeepSeek-AI
 
 
 
 Ruiqi Ge
 
 DeepSeek-AI
 
 
 Kang Guan
 
 DeepSeek-AI
 
 
 Daya Guo
 
 DeepSeek-AI
 
 
 Jianzhong Guo
 
 DeepSeek-AI
 
 
 Guangbo Hao
 
 DeepSeek-AI
 
 
 Zhewen Hao
 
 DeepSeek-AI
 
 
 Ying He
 
 DeepSeek-AI
 
 
 
 Wenjie Hu
 
 DeepSeek-AI
 
 
 Panpan Huang
 
 DeepSeek-AI
 
 
 Erhang Li
 
 DeepSeek-AI
 
 
 Guowei Li
 
 DeepSeek-AI
 
 
 Jiashi Li
 
 DeepSeek-AI
 
 
 Yao Li
 
 DeepSeek-AI
 
 
 Y.K. Li
 
 DeepSeek-AI
 
 
 Wenfeng Liang
 
 DeepSeek-AI
 
 
 
 Fangyun Lin
 
 DeepSeek-AI
 
 
 A.X. Liu
 
 DeepSeek-AI
 
 
 Bo Liu
 
 DeepSeek-AI
 
 
 Wen Liu
 
 DeepSeek-AI
 
 
 Xiaodong Liu
 
 DeepSeek-AI
 
 
 Xin Liu
 
 DeepSeek-AI
 
 
 Yiyuan Liu
 
 DeepSeek-AI
 
 
 Haoyu Lu
 
 DeepSeek-AI
 
 
 
 Shanghao Lu
 
 DeepSeek-AI
 
 
 Fuli Luo
 
 DeepSeek-AI
 
 
 Shirong Ma
 
 DeepSeek-AI
 
 
 Xiaotao Nie
 
 DeepSeek-AI
 
 
 Tian Pei
 
 DeepSeek-AI
 
 
 Yishi Piao
 
 DeepSeek-AI
 
 
 Junjie Qiu
 
 DeepSeek-AI
 
 
 Hui Qu
 
 DeepSeek-AI
 
 
 
 Tongzheng Ren
 
 DeepSeek-AI
 
 
 Zehui Ren
 
 DeepSeek-AI
 
 
 Chong Ruan
 
 DeepSeek-AI
 
 
 Zhangli Sha
 
 DeepSeek-AI
 
 
 Zhihong Shao
 
 DeepSeek-AI
 
 
 Junxiao Song
 
 DeepSeek-AI
 
 
 
 Xuecheng Su
 
 DeepSeek-AI
 
 
 Jingxiang Sun
 
 DeepSeek-AI
 
 
 Yaofeng Sun
 
 DeepSeek-AI
 
 
 Minghui Tang
 
 DeepSeek-AI
 
 
 Bingxuan Wang
 
 DeepSeek-AI
 
 
 Peiyi Wang
 
 DeepSeek-AI
 
 
 
 Shiyu Wang
 
 DeepSeek-AI
 
 
 Yaohui Wang
 
 DeepSeek-AI
 
 
 Yongji Wang
 
 DeepSeek-AI
 
 
 Tong Wu
 
 DeepSeek-AI
 
 
 Y. Wu
 
 DeepSeek-AI
 
 
 Xin Xie
 
 DeepSeek-AI
 
 
 Zhenda Xie
 
 DeepSeek-AI
 
 
 Ziwei Xie
 
 DeepSeek-AI
 
 
 
 Yiliang Xiong
 
 DeepSeek-AI
 
 
 Hanwei Xu
 
 DeepSeek-AI
 
 
 R.X. Xu
 
 DeepSeek-AI
 
 
 Yanhong Xu
 
 DeepSeek-AI
 
 
 Dejian Yang
 
 DeepSeek-AI
 
 
 Yuxiang You
 
 DeepSeek-AI
 
 
 Shuiping Yu
 
 DeepSeek-AI
 
 
 
 Xingkai Yu
 
 DeepSeek-AI
 
 
 B. Zhang
 
 DeepSeek-AI
 
 
 Haowei Zhang
 
 DeepSeek-AI
 
 
 Lecong Zhang
 
 DeepSeek-AI
 
 
 Liyue Zhang
 
 DeepSeek-AI
 
 
 Mingchuan Zhang
 
 DeepSeek-AI
 
 
 
 Minghua Zhang
 
 DeepSeek-AI
 
 
 Wentao Zhang
 
 DeepSeek-AI
 
 
 Yichao Zhang
 
 DeepSeek-AI
 
 
 Chenggang Zhao
 
 DeepSeek-AI
 
 
 Yao Zhao
 
 DeepSeek-AI
 
 
 
 Shangyan Zhou
 
 DeepSeek-AI
 
 
 Shunfeng Zhou
 
 DeepSeek-AI
 
 
 Qihao Zhu
 
 DeepSeek-AI
 
 
 Yuheng Zou
 
 DeepSeek-AI
 
 

 
 Abs

... (truncated, 98 KB total)
Resource ID: 5015ce6023c3cf9c | Stable ID: sid_oeB9Sq8LQg