Longterm Wiki
Navigation
Updated 2026-03-13HistoryData
Page StatusResponse
Edited today62 words
21QualityDraft52ImportanceUseful52ResearchModerate
Summary

This is a pure navigation/index page for interpretability research, listing 6 sub-topics with EntityLinks but containing no substantive content, analysis, or conclusions of its own.

Content3/13
LLM summaryScheduleEntityEdit history1Overview
Tables0/ ~1Diagrams0Int. links6/ ~3Ext. links0/ ~1Footnotes0/ ~2References0/ ~1Quotes0Accuracy0RatingsN:1 R:1 A:1.5 C:2
Change History1
Clarify overview pages with new entity type3 weeks ago

Added `overview` as a proper entity type throughout the system, migrated all 36 overview pages to `entityType: overview`, built overview-specific InfoBox rendering with child page links, created an OverviewBanner component, and added a knowledge-base-overview page template to Crux.

Issues1
StructureNo tables or diagrams - consider adding visual content

Interpretability (Overview)

Interpretability research aims to understand what AI systems are "thinking" and why they behave as they do.

Overview:

  • Interpretability: The field and its importance for safety

Mechanistic Approaches:

  • Mechanistic Interpretability: Reverse-engineering neural networks
  • Sparse Autoencoders: Learning interpretable features
  • Probing: Testing for specific knowledge or concepts
  • Circuit Breakers: Identifying and modifying specific circuits

Representation-Based:

  • Representation Engineering: Controlling behavior via internal representations

Related Pages

Top Related Pages

Approaches

Probing / Linear Probes