Back
Anthropic-OpenAI joint evaluation
webCredibility Rating
4/5
High(4)High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Anthropic Alignment
Data Status
Not fetched
Cited by 9 pages
| Page | Type | Quality |
|---|---|---|
| AI Accident Risk Cruxes | Crux | 67.0 |
| AI-Assisted Alignment | Approach | 63.0 |
| Alignment Evaluations | Approach | 65.0 |
| Evals-Based Deployment Gates | Policy | 66.0 |
| Scalable Eval Approaches | Approach | 65.0 |
| Goal Misgeneralization | Risk | 63.0 |
| Instrumental Convergence | Risk | 64.0 |
| Reward Hacking | Risk | 91.0 |
| Sycophancy | Risk | 65.0 |
Resource ID:
2fdf91febf06daaf | Stable ID: N2NhN2ZlNm