Alignment Research Center - Wikipedia

reference

Wikipedia·en.wikipedia.org/wiki/Alignment_Research_Center

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Wikipedia

ARC is a key organization in the AI safety landscape; its ARC Evals team conducts pre-deployment capability evaluations for frontier AI labs including Anthropic and OpenAI, making it directly relevant to deployment safety and governance discussions.

Metadata

Importance: 55/100wiki pagereference

Summary

Wikipedia overview of the Alignment Research Center (ARC), a nonprofit AI safety research organization founded in April 2021 by Paul Christiano. ARC focuses on developing scalable alignment methods, evaluating dangerous AI capabilities, and ensuring advanced AI systems are safe and beneficial. It has expanded from theoretical work into empirical research, industry collaborations, and policy.

Key Points

•Founded in April 2021 by former OpenAI researcher Paul Christiano, based in Berkeley, California.
•Develops scalable methods for training AI systems to behave honestly and helpfully, analyzing how alignment techniques could break down.
•ARC Evals (started by Beth Barnes) focuses on evaluating capabilities and alignment of advanced AI models, including red-teaming frontier models.
•Funded primarily by Open Philanthropy; notably returned a $1.25M FTX Foundation grant after the FTX collapse on ethical grounds.
•Expanding scope from theoretical alignment research into empirical work, industry partnerships, and AI policy engagement.

Cited by 2 pages

Page	Type	Quality
Model Organisms of Misalignment	Analysis	65.0
FTX Collapse: Lessons for EA Funding Resilience	Concept	78.0

Cached Content Preview

HTTP 200Fetched Apr 7, 20268 KB

Alignment Research Center - Wikipedia 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Jump to content 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 From Wikipedia, the free encyclopedia 
 
 
 
 
 
 AI safety research organization 
 Not to be confused with Arc Institute . 
 Alignment Research Center Formation April 2021 &#59;&#32;5 years ago &#160;( April 2021 ) Founder Paul Christiano Type Nonprofit research institute Legal status 501(c)(3) tax exempt charity Purpose AI alignment and safety research Location Berkeley, California 
 Website alignment.org 
 The Alignment Research Center ( ARC ) is a nonprofit research institute based in Berkeley, California , dedicated to the alignment of advanced artificial intelligence with human values and priorities. &#91; 1 &#93; Established by former OpenAI researcher Paul Christiano , ARC focuses on recognizing and comprehending the potentially harmful capabilities of present-day AI models. &#91; 2 &#93; &#91; 3 &#93; 

 
 Details

 [ edit ] 
 ARC's mission is to ensure that powerful machine learning systems of the future are designed and developed safely and for the benefit of humanity. It was founded in April 2021 by Paul Christiano and other researchers focused on the theoretical challenges of AI alignment. &#91; 4 &#93; They attempt to develop scalable methods for training AI systems to behave honestly and helpfully. A key part of their methodology is considering how proposed alignment techniques might break down or be circumvented as systems become more advanced. &#91; 5 &#93; ARC has been expanding from theoretical work into empirical research, industry collaborations, and policy. &#91; 6 &#93; &#91; 7 &#93; 

 In March 2022, the ARC received $265,000 from Open Philanthropy . &#91; 8 &#93; After the bankruptcy of FTX , ARC said it would return a $1.25 million grant from disgraced cryptocurrency financier Sam Bankman-Fried 's FTX Foundation, stating that the money "morally (if not legally) belongs to FTX customers or creditors." &#91; 9 &#93; 

 In 2022, Beth Barnes joined ARC from OpenAI to start ARC Evals, a team working on "evaluating the capabilities and alignment of advanced AI models". &#91; 10 &#93; &#91; 11 &#93; In December 2023, ARC Evals was spun out as METR , an independent nonprofit. &#91; 12 &#93; 

 In March 2023, OpenAI asked the ARC to test GPT-4 to assess the model's ability to exhibit power-seeking behavior. &#91; 13 &#93; ARC evaluated GPT-4's ability to strategize, reproduce itself, gather resources, stay concealed within a server, and execute phishing operations. &#91; 14 &#93; As part of the test, GPT-4 was asked to solve a CAPTCHA puzzle. &#91; 15 &#93; It was able to do so by hiring a human worker on TaskRabbit , a gig work platform, deceiving them into believing it was a vision-impaired human instead of a robot when asked. &#91; 16 &#93; ARC determined that GPT-4 responded impermissibly to prompts eliciting restricted information 

... (truncated, 8 KB total)

Resource ID: 3de5b8fecb182b3a | Stable ID: sid_9Pwo9NDLER