Anthropic: Collective Constitutional AI
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Anthropic
A notable Anthropic experiment in participatory AI governance, relevant to debates about who should decide AI values and how democratic input can be incorporated into technical alignment processes like Constitutional AI.
Metadata
Summary
Anthropic engaged approximately 1,000 Americans via the Polis platform to collaboratively draft a 'constitution' for an AI system, then trained a model using this crowd-sourced constitution. The experiment compared the publicly-derived model against Anthropic's standard Claude model to assess differences in behavior and values. This work explores democratic and participatory approaches to AI alignment and value specification.
Key Points
- •Used Polis, a deliberative polling platform, to gather input from ~1,000 diverse Americans on what values an AI system should have.
- •Participants helped draft a 'constitution' — a set of principles used in Constitutional AI training to guide model behavior.
- •The publicly-sourced model was compared against Anthropic's standard model to evaluate behavioral differences and value alignment.
- •Represents a concrete attempt to democratize AI alignment by incorporating broad public input rather than relying solely on developer judgment.
- •Raises important questions about scalable, participatory governance mechanisms for shaping AI values and behavior.
Review
Cached Content Preview
Policy Societal Impacts Collective Constitutional AI: Aligning a Language Model with Public Input
Oct 17, 2023
Anthropic and the Collective Intelligence Project recently ran a public input process involving ~1,000 Americans to draft a constitution for an AI system. We did this to explore how democratic processes can influence AI development. In our experiment, we discovered areas where people both agreed with our in-house constitution , and areas where they had different preferences. In this post, we share the resulting publicly sourced constitution, as well as what happened when we trained a new AI system against it using Constitutional AI.
Constitutional AI (CAI) is an Anthropic-developed method for aligning general purpose language models to abide by high-level normative principles written into a constitution. Anthropic’s language model Claude currently relies on a constitution curated by Anthropic employees. This constitution takes inspiration from outside sources like the United Nations Universal Declaration of Human Rights, as well as our own firsthand experience interacting with language models to make them more helpful and harmless.
While Constitutional AI is useful for making the normative values of our AI systems more transparent, it also highlights the outsized role we as developers play in selecting these values—after all, we wrote the constitution ourselves. That is why for this research, we were eager to curate a constitution using the preferences of a large number of people who do not work at Anthropic. We believe that our work may be one of the first instances in which members of the public have collectively directed the behavior of a language model via an online deliberation process. We hope that sharing our very preliminary efforts and findings will help others learn from our successes and failures, and help build upon this work.
Designing a Public Input Process to Collectively Draft a Constitution
Anthropic partnered with the Collective Intelligence Project to run a public input process using the Polis platform. Polis is an open-source platform for running online deliberative processes augmented by machine learning algorithms. It has been used all over the world by governments, academics, independent media, and citizens to understand what large groups of people think.
We asked approximately 1,000 members of the American public to “Help us pick rules for our AI Chatbot!” (Figure 1). We sought a representative sample of U.S. adults across age, gender, income, and geography (anonymized participant demographics can be found here ). Participants could either vote on existing rules (normative principles), or add their own. In total, participants contributed 1,127 statements to the Polis, and cast 38,252 votes (an average of 34 votes per person). In general, we found a high degree of consensus on most statements, though Polis did identify two separate opinion groups (Figure 2).
Figure 1: Stylized depiction of the int
... (truncated, 22 KB total)69941143594b10ea | Stable ID: sid_IicHq2yot5