Overview

Navigation

Overview

Updated 2026-02-20History Data

Page StatusResponse

Edited 2 months ago99 words

Content3/13

Change History1

Clarify overview pages with new entity type2 months ago

Added `overview` as a proper entity type throughout the system, migrated all 36 overview pages to `entityType: overview`, built overview-specific InfoBox rendering with child page links, created an OverviewBanner component, and added a knowledge-base-overview page template to Crux.

Issues2

StaleLast edited 66 days ago - may need review

StructureNo tables or diagrams - consider adding visual content

Evaluation & Detection (Overview)

Evaluation methods assess whether AI systems are aligned and safe to deploy.

General Evaluation:

Evaluations (Evals): Overview of AI evaluation approaches
Alignment Evaluations: Testing for aligned behavior
Dangerous Capability Evaluations: Assessing harmful potential

Capability Assessment:

Capability Elicitation: Uncovering hidden capabilities
Red Teaming: Adversarial testing for vulnerabilities
Model Auditing: Systematic capability review

Deception Detection:

Scheming Detection: Identifying strategic deception
Sleeper Agent Detection: Finding hidden malicious behaviors

Evaluation Scaling:

Eval Saturation & The Evals Gap: Accelerating benchmark saturation and its implications
Scalable Eval Approaches: Practical tools for scaling evaluation capacity
Evaluation Awareness: Models detecting and adapting to evaluation contexts

Deployment Decisions:

Safety Cases: Structured arguments for deployment safety

Related Wiki Pages

Top Related Pages

Scheming & Deception Detection

Research and evaluation methods for identifying when AI models engage in strategic deception—pretending to be aligned while secretly pursuing other...

AI Safety Cases

Structured arguments with supporting evidence that an AI system is safe for deployment.

Capability Elicitation

Systematic methods to discover what AI models can actually do, including hidden capabilities that may not appear in standard benchmarks, through sc...

AI Evaluations

This page analyzes AI safety evaluations and red-teaming as a risk mitigation strategy.

Evaluation Awareness

AI models increasingly detect evaluation contexts and adjust behavior accordingly, scaling as a power law with model size and complicating alignmen...

Approaches

Other