Longterm Wiki

Scalable Eval Approaches

Practical approaches for scaling AI evaluation to keep pace with capability growth, including LLM-as-judge (40% production adoption but theoretically capped at 2x sample efficiency), automated behavioral evals, AI-assisted red teaming, CoT monitoring, and debate-based evaluation achieving 76-88% accuracy.

Related

Related Pages

Top Related Pages

Risks

AI Capability Sandbagging

Approaches

Scheming & Deception DetectionAI-Assisted Alignment

Organizations

US AI Safety InstituteOpenAIUK AI Safety InstituteGoogle DeepMindRedwood ResearchPalisade Research

Other

Scalable OversightInterpretabilityRed TeamingConnor Leahy

Concepts

Alignment Evaluation OverviewAgentic AIReasoning and Planning

Policy

Voluntary AI Safety Commitments

Tags

llm-as-judgeautomated-evalsred-teamingscalable-evaluationaudit-capacity