CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks
governmentCredibility Rating
Gold standard. Rigorous peer review, high editorial standards, and strong institutional reputation.
Rating inherited from publication venue: NIST
This NIST/CAISI report is a government-authored comparative safety and performance evaluation of Chinese AI models, relevant to AI governance, deployment risk, and geopolitical dimensions of AI safety.
Metadata
Summary
NIST's Center for AI Standards and Innovation (CAISI) evaluated DeepSeek AI models (R1, R1-0528, V3.1) against leading U.S. models across 19 benchmarks, finding DeepSeek significantly underperforms on technical metrics and cost-effectiveness. The report also identifies security vulnerabilities and systematic censorship in DeepSeek responses as risks to developers, consumers, and U.S. national security. The evaluation highlights concerns about the rapid global adoption of PRC-developed AI models spurred by DeepSeek's prominence.
Key Points
- •DeepSeek R1, R1-0528, and V3.1 were benchmarked against OpenAI and Anthropic models across 19 evaluation dimensions, with U.S. models outperforming on most metrics.
- •DeepSeek models exhibit security vulnerabilities that pose risks to developers and end users who deploy or interact with them.
- •Censorship behaviors embedded in DeepSeek's responses raise concerns about information integrity and geopolitical influence on AI outputs.
- •DeepSeek's rise has accelerated global adoption of PRC-developed AI, which CAISI flags as a U.S. national security consideration.
- •The evaluation is conducted by a U.S. federal standards body, giving it institutional weight in ongoing AI governance and policy discussions.
Cited by 2 pages
| Page | Type | Quality |
|---|---|---|
| Open vs Closed Source AI | Crux | 60.0 |
| Multipolar Trap (AI Development) | Risk | 91.0 |
Cached Content Preview
CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks | NIST
Skip to main content
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
Lock
A locked padlock
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
https://www.nist.gov/news-events/news/2025/09/caisi-evaluation-deepseek-ai-models-finds-shortcomings-and-risks
NEWS
CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks
September 30, 2025
Share
Facebook
Linkedin
X.com
Email
AI models from developer DeepSeek were found to lag behind U.S. models in performance, cost, security and adoption.
Security shortcomings and censorship may pose risks to application developers, consumers and U.S. national security.
DeepSeek’s products are contributing to a rapid rise in the global use of models from the PRC.
WASHINGTON — The Center for AI Standards and Innovation (CAISI) at the Department of Commerce’s National Institute of Standards and Technology (NIST) evaluated AI models from the People’s Republic of China (PRC) developer DeepSeek and found they lag behind U.S. models in performance, cost, security and adoption.
“Thanks to President Trump’s AI Action Plan, the Department of Commerce and NIST’s Center for AI Standards and Innovation have released a groundbreaking evaluation of American vs. adversary AI,” said Secretary of Commerce Howard Lutnick. “The report is clear that American AI dominates, with DeepSeek trailing far behind. This weakness isn’t just technical. It shows why relying on foreign AI is dangerous and shortsighted. By setting the standards, driving innovation, and keeping America secure, the Department of Commerce will ensure continued U.S. leadership in AI.”
The CAISI evaluation also notes that the DeepSeek models’ shortcomings related to security and censorship of model responses may pose a risk to application developers, consumers and U.S. national security. Despite these risks, DeepSeek is a leading developer and has contributed to a rapid increase in the global use of models from the PRC.
CAISI’s experts evaluated three DeepSeek models (R1, R1-0528 and V3.1) and four U.S. models (OpenAI’s GPT-5, GPT-5-mini and gpt-oss and Anthropic’s Opus 4) across 19 benchmarks spanning a range of domains. These evaluations include state-of-the-art public benchmarks as well as private benchmarks built by CAISI in partnership with academic institutions and other federal agencies.
The evaluation from CAISI responds to President Donald Trump’s America’s AI Action
... (truncated, 6 KB total)ff1a185c3aa33003 | Stable ID: sid_rqorf3vhoy