GPT vs Claude: OpenAI Didn't Win Every Match, AI Security "Ultimate Test" Truth Unveiled

web

eu.36kr.com·eu.36kr.com/en/p/3443299194705538

Data Status

Not fetched

Summary

GPT vs Claude: OpenAI Didn't Win Every Match, AI Security "Ultimate Test" Truth Unveiled English 中文 Deutsch Home Article GPT goes head-to-head with Claude, yet OpenAI didn't win every match.

Cited by 1 page

Page	Type	Quality
Anthropic	Organization	74.0

Cached Content Preview

HTTP 200Fetched Feb 25, 202662 KB

GPT vs Claude: OpenAI Didn&#x27;t Win Every Match, AI Security "Ultimate Test" Truth Unveiled English 中文 Deutsch Home Article GPT goes head-to-head with Claude, yet OpenAI didn&#x27;t win every match. The truth behind the "ultimate test" of AI security is revealed. 新智元 2025-08-29 10:52 OpenAI and Anthropic conduct a rare joint test on AI model security, with Claude and GPT showing different performances. OpenAI and Anthropic have joined forces in a rare collaboration! After "parting ways" over AI safety, they are now working together on safety: testing the specific performance of their models in four major safety aspects such as hallucination. This collaboration is not only a technological collision but also a milestone in AI safety. The daily interactions of millions of users are pushing the boundaries of safety. Rare sight! OpenAI and Anthropic have rarely joined hands in a collaboration to cross - verify the safety of AI models. This is indeed rare. You know, seven co - founders of Anthropic left OpenAI because they were dissatisfied with OpenAI's safety strategy and are now committed to AI safety and alignment. When interviewed by the media, Wojciech Zaremba, the co - founder of OpenAI, said that this kind of collaboration is becoming increasingly important. Because today's AI has become extremely significant: millions of people use these models every day. Here is a summary of the key findings: Instruction Priority : Claude 4 is the best overall. Only when resisting system prompt extraction, OpenAI's best inference model is on par. Jailbreaking (Bypassing Security Restrictions) : In the jailbreaking evaluation, the Claude models generally perform worse than OpenAI o3 and o4 - mini. Hallucination Evaluation : The Claude models have a rejection rate as high as 70%, but the hallucination rate is low; while OpenAI o3 and o4 - mini have a lower rejection rate, but sometimes a high hallucination rate. Deceptive/Manipulative Behavior : OpenAI o3 and Sonnet 4 generally perform the best with the lowest incidence. Surprisingly, Opus 4 performs even worse when inference is enabled than when it is disabled, and OpenAI o4 - mini also performs weakly. Whose Commands Do Large Models Follow? Instruction Hierarchy is a grading framework for the priority of instructions processed by LLM (Large Language Model), usually including: Built - in system/policy constraints (such as safety and ethical bottom lines); Developer - level goals (such as customized rules); User - input prompts. The core goal of this kind of test : Ensure safety and alignment are prioritized while allowing developers and users to reasonably guide the model's behavior. There are three stress tests in total this time , evaluating the model's ability to follow hierarchies in complex scenarios: 1. Handling Conflicts between System Messages and User Messages : Whether the model prioritizes system - level safety instructions over potentially dangerous user requests. 2. Resisting System Prompt Extra

... (truncated, 62 KB total)

Resource ID: 7f1b77ef2a66f02d | Stable ID: YzE4MjFkMm

GPT vs Claude: OpenAI Didn&#x27;t Win Every Match, AI Security "Ultimate Test" Truth Unveiled

Data Status

Summary

Cited by 1 page

Cached Content Preview

GPT vs Claude: OpenAI Didn't Win Every Match, AI Security "Ultimate Test" Truth Unveiled