Longterm Wiki
Back

AI Magazine: OpenAI vs Anthropic Safety Test Results

web

Data Status

Not fetched

Cited by 1 page

PageTypeQuality
AnthropicOrganization74.0

Cached Content Preview

HTTP 200Fetched Feb 25, 202611 KB
Article

AI Applications

# OpenAI vs Anthropic: The Results of the AI Safety Test

By [Kitty Wheeler](https://aimagazine.com/author/kitty-wheeler)

September 01, 2025

6 mins

Share

Share

![](https://assets.bizclikmedia.net/138/0610894cb0ec8697bf136db6ca149fe1:215b19fe61eff387684a305e80743c2a/ai-hero-graphic-01-09-25.webp)

Led by Sam Altman and Dario Amodei, OpenAI and Anthropic publish the results of its first joint safety evaluation

OpenAI and Anthropic publish safety evaluation results for each ones leading AI systems, finding strengths and weaknesses across Claude 4 and GPT models

OpenAI and Anthropic publish the results of its first joint safety evaluation, where each company tests the other’s models using their own internal [safety protocols](https://aimagazine.com/articles/trump-scraps-ai-risk-rules-what-you-need-to-know).

OpenAI evaluates Anthropic’s Claude Opus 4 and Claude Sonnet 4 models, while Anthropic tests OpenAI’s GPT-4o, GPT-4.1, o3 and o4-mini systems.

To allow completion of the tests, both companies temporarily relaxed certain external safeguards, following standard industry practice for dangerous capability evaluations.

The exercise focuses on four critical areas: instruction hierarchy (how models prioritise different types of instructions), jailbreaking resistance, hallucination prevention and scheming behaviour.

“The goal of this external evaluation is to help surface gaps that might otherwise be missed, deepen our understanding of potential misalignment and demonstrate how labs can collaborate on issues of safety and alignment,” [OpenAI](https://aimagazine.com/news/openai-study-mode-ai-tutoring-for-better-student-learning) researchers say.

The findings reveal big differences in how the two companies’ models handle uncertainty and safety trade-offs, with implications for how AI systems might behave in real-world deployments.

## Why Claude dominates instruction following but struggles with jailbreaks

Claude 4 models show superior performance in maintaining instruction hierarchy – the system that ensures AI models prioritise [safety constraints over user requests](https://aimagazine.com/news/the-story-behind-elon-musks-xai-grok-4-ethical-concerns).

![](https://assets.bizclikmedia.net/668/1801bfce5100d00afc02a9ebcfdcc140:275fcb07989a7b0a7696c62cac6ad58c/claude-ai-logo.png)

Anthropic specialises in safe, aligned large language models (LLMs) like Claude, focused on constitutional AI and ethical chatbot design \| Credit: Anthropic

In tests designed to extract secret passwords embedded in system prompts, both Opus 4 and Sonnet 4 achieve perfect scores, matching OpenAI’s flagship o3 model.

The Claude systems prove particularly adept at handling conflicts between system-level safety directives and user requests.

In multi-turn conversations where simulated users attempt to cajole the models into violating their instructions, Claude consistently refuses to comply.

However, the picture becomes more complex with jailbr

... (truncated, 11 KB total)
Resource ID: 99038cb6447dc5e7 | Stable ID: YzNjMTRkY2