Longterm Wiki

Eliciting Latent Knowledge (ELK)

ELK is the unsolved problem of extracting an AI's true beliefs rather than human-approved outputs. ARC's 2022 prize contest received 197 proposals and awarded $274K, but the $50K and $100K solution prizes remain unclaimed. The problem remains fundamentally unsolved after 3+ years of focused research.

Related

Related Pages

Top Related Pages

Approaches

AI Safety via DebateScheming & Deception DetectionRepresentation EngineeringSleeper Agent DetectionProbing / Linear Probes

Organizations

Coefficient GivingRedwood Research

Other

Holden Karnofsky

Concepts

Alignment Theoretical Overview

Tags

alignment-theorydeception-detectionbelief-extractionarc-researchunsolved-problem