Longterm Wiki

Red Teaming

Evaluationactive

Adversarial testing of AI systems to discover failure modes, both manual and automated.

Organizations
3
Risks Addressed
1
Cluster: Evaluation
Parent Area: AI Evaluations

Tags

function:assurancescope:sub-field

Organizations3

OrganizationRole
Anthropicactive
Google DeepMindactive
OpenAIactive

Sub-Areas1

NameStatusOrgsPapers
Jailbreak ResearchFinding, categorizing, and patching prompt injection and jailbreak attacks.active00