Collective Constitutional AI

paper

Anthropic·anthropic.com/research/collective-constitutional-ai-align...

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic

Data Status

Full text fetchedFetched Dec 28, 2025

Summary

Researchers used the Polis platform to gather constitutional principles from ~1,000 Americans. They trained a language model using these publicly sourced principles and compared it to their standard model.

Key Points

•First known attempt to collectively define AI constitutional principles through public deliberation
•Public-sourced constitution emphasized objectivity, impartiality, and accessibility
•Publicly trained model demonstrated reduced bias compared to developer-defined model

Review

This research represents an innovative attempt to democratize AI alignment by incorporating public preferences into an AI system's constitutional principles. By engaging approximately 1,000 Americans in an online deliberation process, the researchers sought to move beyond developer-defined values and explore how collective input might shape AI behavior. Methodologically, the study used the Polis platform to solicit and vote on potential AI governance principles, then translated these into a constitutional framework for model training. The resulting 'Public' model was rigorously evaluated against a 'Standard' model, revealing interesting nuances. While performance remained largely equivalent, the Public model showed notably lower bias across social dimensions, particularly in disability status and physical appearance. This suggests that public input can potentially introduce more inclusive and balanced principles into AI systems.

Cited by 5 pages

Page	Type	Quality
AI Alignment	Approach	91.0
Anthropic Core Views	Safety Agenda	62.0
AI-Assisted Deliberation	Approach	63.0
AI Model Specifications	Policy	50.0
AI Value Lock-in	Risk	64.0

Resource ID: 3c862a18b467640b | Stable ID: NGE4YTgyZD