Longterm Wiki
Navigation
Updated 2026-03-13HistoryData
Page StatusResponse
Edited today7 words1 backlinks
84ImportanceHigh67ResearchModerate
Content1/13
LLM summaryScheduleEntityEdit historyOverview
Tables0/ ~1Diagrams0Int. links0/ ~3Ext. links0/ ~1Footnotes0/ ~2References0/ ~1Quotes0Accuracy0Backlinks1
Issues1
StructureNo tables or diagrams - consider adding visual content

Natural Abstractions

Concept

Natural Abstractions

The hypothesis that natural abstractions converge across learning processes, aiding alignment

Related
Safety Agendas
Interpretability
7 words · 1 backlinks

This page is a stub. Content needed.

Related Pages

Top Related Pages

Approaches

Representation EngineeringSleeper Agent DetectionAI-Assisted AlignmentMechanistic Interpretability

Risks

Deceptive Alignment

Analysis

Model Organisms of MisalignmentCapability-Alignment Race Model

Safety Research

Anthropic Core Views

Organizations

AnthropicConjecture

Key Debates

AI Alignment Research AgendasTechnical AI Safety ResearchIs Interpretability Sufficient for Safety?

Concepts

Dense Transformers

Historical

Deep Learning Revolution EraMainstream Era

Other

Dario AmodeiYoshua BengioChris OlahNeel Nanda