Longterm Wiki

AI Welfare Research

AnthropicResearch Areasai-welfare

Record Metadata

Record Keyai-welfare
EntityAnthropic
CollectionResearch Areas(6 records total)
SchemaMajor research initiatives and focus areas.
YAML Filepackages/kb/data/things/mK9pX3rQ7n.yaml

Fields

NameAI Welfare Research
DescriptionInvestigating moral status and welfare considerations for AI systems
StartedJan 2024
NotesKyle Fish hired as first full-time AI welfare researcher at a major AI lab

Other Records in Research Areas (5)

KeyNameDescriptionTeam Size
mechanistic-interpretabilityMechanistic InterpretabilityUnderstanding neural network internals through reverse-engineering50
constitutional-aiConstitutional AITraining AI systems to follow principles through self-critique and RLAIF
alignment-scienceAlignment ScienceScalable oversight, weak-to-strong generalization, robustness to jailbreaks
responsible-scaling-policyResponsible Scaling PolicyFramework for evaluating and mitigating risks at each capability level
sleeper-agentsSleeper Agents ResearchInvestigating whether AI systems can maintain hidden behaviors through training
Record: ai-welfare | Longterm Wiki