Back
DeepSeek Warns of Jailbreak Risks in Open-Source AI Models
webRelevant to AI safety governance debates around open-source model risks and the comparative transparency of Chinese vs. Western AI labs in disclosing and mitigating model vulnerabilities.
Metadata
Importance: 42/100news articlenews
Summary
DeepSeek published its first safety evaluation of its AI models in Nature, revealing that open-source models—including its own R1 and Alibaba's Qwen2.5—are particularly vulnerable to jailbreak attacks. The report highlights a disparity between Chinese and American AI companies in publicizing model risks and implementing safety frameworks, with US firms like Anthropic and OpenAI having established formal risk mitigation policies.
Key Points
- •DeepSeek published a peer-reviewed safety evaluation in Nature, its first public disclosure of AI model risks.
- •Open-source models like DeepSeek R1 and Alibaba Qwen2.5 were found most susceptible to jailbreak attacks producing harmful outputs.
- •Chinese AI companies have been less transparent about model risks compared to US counterparts with formal safety frameworks.
- •American firms like Anthropic (Responsible Scaling Policies) and OpenAI (Preparedness Framework) have established public risk mitigation policies.
- •The disclosure signals a potential shift toward greater safety transparency from Chinese AI developers.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Open Source AI Safety | Approach | 62.0 |
Cached Content Preview
HTTP 200Fetched Apr 7, 20269 KB
DeepSeek warns of 'jailbreak' risks for its open-source models US English
US y LATAM Español
Australia English
Canada English
Canada Français
Deutschland Deutsch
France Français
香港 繁中
Malaysia English
New Zealand English
Singapore English
台灣 繁中
UK English
Mail Search the web Advertisement Advertisement Return to Homepage Top Stories:
Trump's Iran ultimatum
Artemis II updates
U.S. airman's rescue in Iran
Savannah Guthrie on 'Today'
Trump viral rumor denial
2nd airman rescued in Iran
Eye drop bottle recall
Trump budget proposal
Trump admin. departures
Trump fires Bondi
DeepSeek has revealed details about the risks posed by its artificial intelligence models for the first time, noting that open-sourced models are particularly susceptible to being "jailbroken" by malicious actors.
The Hangzhou-based start-up said it evaluated its models using industry benchmarks as well as its own tests in a peer-reviewed article published in the academic journal Nature.
American AI companies often publicise research about the risks of their rapidly improving models and have introduced risk mitigation policies in response, such as Anthropic's Responsible Scaling Policies and OpenAI's Preparedness Framework.
Advertisement Advertisement Do you have questions about the biggest topics and trends from around the world? Get the answers with SCMP Knowledge , our new platform of curated content with explainers, FAQs, analyses and infographics brought to you by our award-winning team.
Chinese companies were less outspoken about risks, despite their models being just a few months behind their US equivalents, according to AI experts. However, DeepSeek had conducted evaluations of such risks before, including the most serious "frontier risks", the Post reported earlier.
The Nature paper provided more "granular" details about DeepSeek's testing regime, said Fang Liang, an expert member of China's AI Industry Alliance (AIIA), an industry body. These included "red-team" tests based on a framework introduced by Anthropic, in which testers try to get AI models to produce harmful speech.
According to the paper, DeepSeek found that its R1 reasoning model and V3 base model - released in January 2025 and December 2024, respectively - had slightly higher-than-average safety scores across six industry benchmarks than OpenAI's o1 and GPT-4o, both released last year, and Anthropic's Claude-3.7-Sonnet, released in February.
Advertisement Advertisement However, it found that R1 was "relatively unsafe" once its external "risk control" mechanism was removed, following tests on its own in-house safety benchmark consisting of 1,120 test questions.
AI companies typically try to prevent their systems from generating harmful content by "fine-tuning" the models themselves during the training process or adding external content filters.
However, experts have warned that such safety
... (truncated, 9 KB total)Resource ID:
393d870a262b0132 | Stable ID: sid_hA90RWLJ3e