FAR AI
FAR AI
FAR AI is an AI safety research nonprofit founded in July 2022 by Adam Gleave (CEO) and Karl Berzins (Co-founder & President). Based in Berkeley, California, the organization conducts technical research in adversarial robustness, model evaluation, interpretability, and alignment. Notable work includes demonstrating that adversarial policies can defeat superhuman Go AIs and co-authoring the 'Towards Guaranteed Safe AI' framework. FAR AI reported \$24.3M in FY2024 revenue and secured over \$30M in 2025 funding commitments from funders including Coefficient Giving (previously Open Philanthropy), Schmidt Sciences, and the Survival and Flourishing Fund. In early 2026, FAR AI was selected by the European Commission's AI Office to lead CBRN risk research under tender EC-CNECT/2025/OP/0032. The organization also operates FAR.Labs (a Berkeley coworking space with 40+ members) and a \$12M grantmaking program.
FAR AI
FAR AI is an AI safety research nonprofit founded in July 2022 by Adam Gleave (CEO) and Karl Berzins (Co-founder & President). Based in Berkeley, California, the organization conducts technical research in adversarial robustness, model evaluation, interpretability, and alignment. Notable work includes demonstrating that adversarial policies can defeat superhuman Go AIs and co-authoring the 'Towards Guaranteed Safe AI' framework. FAR AI reported \$24.3M in FY2024 revenue and secured over \$30M in 2025 funding commitments from funders including Coefficient Giving (previously Open Philanthropy), Schmidt Sciences, and the Survival and Flourishing Fund. In early 2026, FAR AI was selected by the European Commission's AI Office to lead CBRN risk research under tender EC-CNECT/2025/OP/0032. The organization also operates FAR.Labs (a Berkeley coworking space with 40+ members) and a \$12M grantmaking program.
Overview
FAR AI (far.ai) is an AI safety research nonprofit founded in July 2022 by Adam Gleave (CEO) and Karl Berzins (Co-founder & President).1 Adam Gleave completed his PhD in AI at UC Berkeley, advised by Stuart Russell.2 The organization's mission is to ensure AI systems are trustworthy and beneficial to society.3 FAR AI incorporated in October 2022 as a 501(c)(3) nonprofit (EIN 92-0692207), having initially operated as a fiscally sponsored project.4
FAR AI conducts technical research in areas including adversarial robustness, Interpretability, model evaluation, and alignment, focusing on fundamental AI safety challenges described as too large or resource-intensive for academia.5 Notable results include adversarial policies that achieved a >99% win rate against the superhuman Go AI KataGo when it uses no tree search, and a >77% win rate even when KataGo uses superhuman-level search—policies that are themselves easily beaten by human amateur players, suggesting that high capability does not guarantee robustness in adversarial settings.6 FAR AI co-authored the "Towards Guaranteed Safe AI" framework paper published in May 2024.5 In early 2026, FAR AI was selected by the European Commission's AI Office to lead CBRN risk research under tender EC-CNECT/2025/OP/0032.7
The organization has grown to 40+ total staff as of early 2026, with a technical team of approximately 15 researchers and plans to scale to 30+ researchers.8 Financial details and program structure are described in the sections below.
Key Research Areas
Adversarial Robustness
| Research Focus | Approach | Safety Connection | Publications |
|---|---|---|---|
| Adversarial Attacks on Go AI | Training adversarial policies against KataGo | Superhuman systems remain exploitable by adversarial inputs | "Adversarial Policies Beat Superhuman Go AIs" (2023)6 |
| Go AI Defense Analysis | Testing adversarial training, iterated adversarial training, vision transformers | None of three tested defenses withstood adaptive attacks | "Can Go AIs be adversarially robust?" (2024)9 |
| LLM Adversarial Training | Adversarial training vs. scaling for robustness | Orders-of-magnitude efficiency gains over scaling alone | FAR.AI robustness research10 |
| Multi-layer Defense Bypass | STACK (STaged AttaCK) method against layered AI defenses | Identifies gaps in defense-in-depth strategies | 71% attack success rate on ClearHarm dataset11 |
FAR AI's research in adversarial robustness has produced several empirical results. A 2023 paper, "Adversarial Policies Beat Superhuman Go AIs," demonstrated that adversarial policies achieved a >99% win rate against KataGo when it uses no tree-search, and a >77% win rate even when KataGo uses superhuman-level search.6 The adversarial policies win by inducing blunders in KataGo rather than by playing stronger Go, and are themselves easily beaten by human amateur players.6 A follow-up 2024 paper, "Can Go AIs be adversarially robust?," tested three natural defenses—positional adversarial training, iterated adversarial training, and vision transformer architectures—and found that none could withstand adaptive adversarial attacks.9 FAR AI has also found that adversarial training improves language model robustness orders of magnitude more efficiently than scaling model size alone, and that larger language models are more vulnerable to data poisoning, a result demonstrated across 23 LLMs from 8 model series.10
The STACK (STaged AttaCK) method, documented in a paper co-authored with UK AI Safety Institute researchers (arXiv:2506.24068, submitted June 2025), achieved a 71% attack success rate on the ClearHarm dataset in black-box attacks against multi-layered classifier pipelines, compared to 0% for conventional attacks against the same defenses.11 The paper's authors from FAR AI include Ian R. McKenzie, Oskar J. Hollinsworth, Tom Tseng, and Adam Gleave; collaborating researchers from UK AISI include Xander Davies, Stephen Casper, Aaron D. Tucker, and Robert Kirk.11
Research Programs
| Program | Purpose | Details |
|---|---|---|
| FAR.Labs | Co-working space | Berkeley-based AI safety research hub with 40+ active members1 |
| Grantmaking | Fund external research | $12 million from Coefficient Giving (formerly Open Philanthropy) supports academics and independent researchers12 |
| Events & Workshops | Convene stakeholders | 1,000+ attendees across 10+ events hosted1 |
| In-house Research | Technical safety work | Robustness, interpretability, alignment; 30+ research papers published1 |
The grantmaking program funds external researchers working on AI safety problems, with four initial grants targeting robustness across data poisoning and model stealing, automated alignment testing, weak-to-strong generalization, and alignment security against jailbreaks and finetuning attacks.12 Individual recipient names are not publicly disclosed; grantees are identified through expert nomination rather than public calls for proposals, though FAR AI has indicated plans to launch public RFPs in future cycles.12 FAR.Labs provides physical co-working space and community infrastructure for independent researchers. The events program hosts workshops and convenings, including the inaugural Technical Innovations for AI Policy Conference and international alignment workshops.13
LLM Red-Teaming and Model Evaluation
FAR AI began red-teaming leading language models for frontier labs in Q4 2023, including red-teaming of GPT-4.1 The organization delivers research through peer-reviewed publications, governmental partnerships, and red-teaming engagements.3 As of early 2026, publicly confirmed red-teaming engagements include work with OpenAI and the EU AI Office CBRN evaluation role; testing of STACK variants on production systems such as Claude 4 Opus was described as ongoing research under responsible disclosure protocols.1114
Natural Abstractions Research
FAR AI has expressed theoretical interest in natural abstractions research, which explores whether intelligent systems independently converge on similar conceptual representations of the world. This research direction connects to work by MIRI and other organizations investigating whether shared abstractions between human and AI cognition could provide a foundation for alignment approaches. This area remains at an early theoretical stage within FAR AI's portfolio and has not yet produced peer-reviewed publications attributed to the organization.
Organizational Structure and Operations
Leadership
FAR AI was founded in July 2022 by Adam Gleave and Karl Berzins.1 Gleave serves as Co-founder & CEO; Berzins serves as Co-founder & President.1516 Berzins held the title of COO as of December 2023, with his title transitioning to President thereafter, as reflected on FAR AI's official website.1715 Gleave completed his PhD in artificial intelligence at UC Berkeley under Stuart Russell's supervision, and his research focuses on developing techniques for AI systems to act according to human preferences.18 According to public 990 filings, Gleave's compensation was $229,331 and Berzins's was $182,641 in fiscal year 2024.19
Organizational Structure
FAR AI is incorporated as a 501(c)(3) nonprofit with EIN 92-0692207, tax-exempt since November 2020.19 The organization maintains a policy capping revenue from for-profit AI developers at a maximum of 10% of total annual revenue, and charges market rates to avoid subsidizing private actors.4 FAR AI reported $24.3 million in revenue and $8.6 million in expenses for fiscal year 2024, per its Form 990 filed November 14, 2025.19
Research Team and Staffing
FAR AI's staffing has grown substantially since its founding. As of December 2023, the organization had 12 full-time staff (approximately 11.5 FTEs), including 5 technical staff, a 3-person operations team, and a 1.5 FTE communications team.17 By the time of the $30M+ funding announcement in 2025, the technical research team had grown to approximately 15 researchers, with plans to scale to 30+ technical researchers over the following 12–18 months.8 As of early 2026, FAR AI's job postings describe the organization as having 40+ total staff.20 The organization maintains Operations, Communications, and Technical Staff departments.
FAR.Labs Co-working Space
FAR.Labs is a co-working hub located in downtown Berkeley, opened in March 2023, and now houses 40+ active members working on AI safety problems.17 Coefficient Giving (previously Open Philanthropy) provided $1.7 million over three years specifically to support the FAR.Labs coworking hub.21
Current State & Trajectory
As of early 2026, FAR AI has expanded to 40+ staff with plans to scale the technical team from 15 to 30+ researchers.208 Key recent developments include the $30M+ multi-funder commitment in 2025,8 the launch of a $12M grantmaking program in Q3 2024,22 and selection by the European Commission's AI Office to lead CBRN risk research in early 2026.7
Looking ahead, FAR AI plans to launch public requests for proposals focused on high-impact research areas, broadening access to its grantmaking program beyond nomination-only pathways.1
Strategic Position Analysis
Organizational Comparisons
FAR AI conducts empirical technical AI safety research without developing or deploying AI products of its own.1 Its work emphasizes adversarial robustness of deployed systems — demonstrating, for example, that superhuman Go AIs can be defeated by adversarial policies despite their capabilities.4 FAR AI caps revenue from for-profit AI developers at 10% of total annual revenue to preserve research independence.10
| Organization | Focus | Overlap with FAR AI | Differentiation |
|---|---|---|---|
| Anthropic | Constitutional AI, frontier model development | Safety research, red-teaming | FAR AI does not develop or deploy models; revenue from for-profits capped at 10% |
| ARC | Theoretical alignment research | Alignment goals | FAR AI uses empirical ML methods (e.g., adversarial training experiments) rather than formal theory |
| METR | Model evaluation | Safety assessment, red-teaming | FAR AI additionally researches adversarial robustness defenses and value alignment |
| Academic Labs | ML research | Technical methods, publication venues | FAR AI focuses on safety-specific problems described as too large or resource-intensive for academia |
Positioning in AI Safety Ecosystem
FAR AI publishes at mainstream ML venues (NeurIPS, ICML, ICLR) to reach audiences beyond specialized safety communities.3 The organization focuses on fundamental AI safety challenges described as too large or resource-intensive for academia alone.3 FAR AI's research has found that even superhuman AI systems can fail against adversarial attacks, and that none of three tested defense strategies could withstand adaptive adversaries.9
FAR AI's position within the Berkeley AI safety ecosystem is reinforced by FAR.Labs and its grantmaking program, which together support researchers across institutional boundaries.12 The organization's adversarial robustness agenda connects near-term safety concerns about deployed systems to longer-term alignment challenges.12
Research Impact and Influence
Academic Reception
FAR AI has published over 30 research papers across robustness, value alignment, and model evaluation.1 Research appears at top-tier venues including NeurIPS, ICML, and ICLR.17 The KataGo adversarial policies paper (2023) has accumulated 99 citations on Google Scholar; the earlier foundational adversarial policies paper published at ICLR 2020—predating FAR AI's founding—has amassed 555 citations, reflecting its influence in the broader field.23 A 2025 multi-agent risks paper has reached 113 citations.23 FAR AI's research has been cited in congressional testimony and mainstream media.3
The 2024 paper "Towards Guaranteed Safe AI," co-authored by 17 contributors including Yoshua Bengio and Stuart Russell, proposed a safety framework using world models, safety specifications, and verifiers.5
Policy Engagement
In early 2026, FAR AI was selected by the European Commission's AI Office to lead Lot 1 (CBRN Risk Modelling and Evaluation) of tender EC-CNECT/2025/OP/0032, titled "Artificial Intelligence Act: Technical Assistance for AI Safety."7 The contract has a three-year duration; the total tender value across all six lots is €9,080,000.24 FAR AI leads a consortium including SecureBio (biological threat assessment) and SaferAI (AI governance and risk modeling), with subcontractors GovAI, Nemesys Insights, and Equistamp.7 FAR AI also participates as a subcontractor on Lot 4 (Harmful Manipulation Risk).7
FAR AI launched the inaugural Technical Innovations for AI Policy Conference on May 31–June 1, 2025, convening over 150 technical experts, researchers, and policymakers in Washington, D.C.4 FAR AI's research on robustness and evaluation is relevant to ongoing AI governance discussions.
Research Questions and Uncertainties
Theoretical Questions
Several theoretical questions shape FAR AI's research direction:
-
Natural Abstractions Validity: Whether intelligent systems independently converge on similar conceptual representations remains an open empirical question. The natural abstractions hypothesis has theoretical appeal but requires extensive empirical validation across diverse AI architectures and training regimes.
-
Robustness-Alignment Connection: The relationship between adversarial robustness and value alignment is not fully understood. While robustness may be necessary for aligned systems, the degree to which robustness research directly contributes to solving alignment problems remains debated within the AI safety community.
-
Scaling Dynamics: Whether current robustness and evaluation approaches will remain relevant as AI systems increase in capability is uncertain. Some safety researchers argue that qualitatively new challenges emerge at higher capability levels that may not be addressed by current methodologies.
Organizational Uncertainties
-
Research Timeline: Academic publication timelines typically span months to years, including peer review, revision, and conference scheduling. Whether this research pace adequately matches the urgency of safety concerns depends on assessments of timelines for Transformative AI development.
-
Scope Evolution: FAR AI's research focus may evolve as the field develops. The organization's emphasis on empirical robustness could shift toward other safety approaches depending on which problems prove most tractable or urgent.
-
Policy Engagement: The extent of FAR AI's involvement in AI governance and policy discussions may expand beyond its current focus on technical research and convening activities.
Field-Wide Debates
| Debate | FAR AI Approach | Alternative Views |
|---|---|---|
| Value of robustness for alignment | Robustness research treated as relevant to safety | Some researchers see limited connection to core alignment |
| Natural abstractions importance | Theoretical interest in the concept | Others view the hypothesis as speculative without strong evidence |
| Academic vs. applied research | Maintains academic publication model | Some argue industry-facing applied research is more impactful |
| Benchmark limitations | Benchmark development as part of research program | Others raise fundamental Goodhart's Law concerns |
These debates reflect broader disagreements within the AI safety community about research priorities, timelines, and the relationship between different technical approaches to safety.
Funding & Sustainability
Current Funding Model
FAR AI's funding profile reflects concentration among sources within the effective altruism ecosystem. Coefficient Giving (which rebranded from Coefficient Giving in November 2025)25 is the principal funder; FAR AI's own press materials refer to the funder as "Coefficient Giving (previously Coefficient Giving)."8 Coefficient Giving has provided multiple distinct grants to FAR AI: approximately $28.675 million over three years for research team expansion, a technical internship and fellowship program, and a governance team; $6.65 million over two years for FAR.Futures (events, outreach, and field-building); $2.16 million over three years for general support; and $1.7 million over three years for the FAR.Labs coworking hub.21 A separate $12 million grant from Open Philanthropy (now Coefficient Giving) funds FAR AI's grantmaking program for external researchers.12
Additional funders announced in FAR AI's 2025 funding announcement include Schmidt Sciences, the Survival and Flourishing Fund (SFF), the Center for Security and Emerging Technology (CSET), and the AI Safety Fund (AISF) supported by the Frontier Model Forum.8 FAR AI's job postings reference total funding of over $40 million, suggesting the cumulative figure exceeds the $30M+ figure cited in the formal press release.20 Good Ventures, Coefficient Giving's partner foundation, supports FAR AI as part of its Navigating Transformative AI focus area.23
In fiscal year 2024, FAR AI reported $24.3 million in revenue and $8.6 million in expenses, per its Form 990 filed November 14, 2025.19
Revenue from for-profit AI developers is capped at a maximum of 10% of FAR AI's total annual revenue, and FAR AI commits to disclosing the fraction of revenue derived from such consulting.4 This concentration of philanthropic funding creates exposure to shifts in funder priorities; it also provides multi-year commitments that enable longer-horizon research planning. The $12 million grantmaking allocation is directed toward supporting academics and independent researchers working on critical AI safety problems.1
Criticisms and Responses
Academic Pace Concerns
Criticism: Academic publication processes operate on timelines of months to years, including peer review, revision, and conference scheduling. Critics argue this pace may be too slow given rapid AI capability advances and the urgency of safety concerns.
Response: Proponents of the peer-reviewed publication model argue it ensures research quality and credibility, and that methodology and evaluation frameworks developed through careful research provide lasting value even as specific techniques evolve. Preprint sharing and direct collaboration with AI labs can accelerate impact for time-sensitive findings.
Context: This tension between research rigor and speed affects the broader AI safety field. Different organizations make different tradeoffs between publication quality, speed, and direct industry impact.
Limited Scope Questions
Criticism: Research on adversarial robustness and evaluation may not directly address core alignment challenges like deceptive alignment, goal specification, or value learning. Critics question whether robustness research provides sufficient traction on harder alignment problems. A pointed version of this concern is that demonstrating vulnerabilities in narrow systems — such as the finding that superhuman Go AIs can be beaten by adversarial policies playing cyclic patterns that amateur humans easily defeat1 — illuminates failure modes without providing a clear path to preventing them in more capable systems.4
Response: FAR AI argues robustness is a necessary foundation for aligned systems, noting that "most alignment proposals using helper ML systems will fail if helpers are exploited by main systems."19 FAR AI's portfolio also includes value alignment work that produced more sample-efficient value learning algorithms13 and contributed to the "Towards Guaranteed Safe AI" framework co-authored with researchers including Yoshua Bengio and Stuart Russell.5
Context: The AI safety field contains diverse views on which research directions are most valuable. Some researchers emphasize near-term robustness and evaluation, while others focus on long-term theoretical alignment challenges. FAR AI's own findings that adversarial training improves language model robustness orders of magnitude more efficiently than scaling alone4 are cited as evidence that empirical robustness work can yield generalizable insights.
Natural Abstractions Theory Concerns
Criticism: The natural abstractions hypothesis lacks extensive empirical validation. Critics argue that theoretical frameworks should be grounded in experimental evidence before receiving substantial research attention.
Response: Proponents argue theoretical frameworks can productively guide empirical research programs, and that the multi-year timeline for validation is appropriate given the scope of the hypothesis.
Context: Disagreement about when to invest in theoretical versus empirical work is common in early-stage scientific fields. Different researchers make different judgments about the appropriate balance.
External Links
- FAR.AI — official website
- FAR.AI Research — publications and papers
- FAR.AI Programs — grantmaking, events, FAR.Labs
- FAR.AI Transparency — financial disclosures and policies
Footnotes
-
About | FAR.AI (https://far.ai/about) ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11
-
Adam Gleave - AI2050 (https://ai2050.schmidtsciences.org/fellow/adam-gleave) ↩
-
Research Overview – FAR.AI (https://far.ai/research) ↩ ↩2 ↩3 ↩4 ↩5
-
Transparency | FAR.AI (https://far.ai/about/transparency) ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7
-
Towards Guaranteed Safe AI | FAR.AI (https://far.ai/research/towards-guaranteed-safe-ai-a-framework-for-ensuring-robust-and-reliable-ai-systems) ↩ ↩2 ↩3 ↩4
-
Adversarial Policies Beat Superhuman Go AIs | FAR.AI (https://far.ai/research/adversarial-policies-beat-superhuman-go-ais) ↩ ↩2 ↩3 ↩4
-
FAR.AI Selected to Lead EU AI Act CBRN Risk Consortium — FAR AI official announcement, February 2026 ↩ ↩2 ↩3 ↩4 ↩5
-
FAR.AI Secures Over $30 Million in Multi-Funder Support to Scale Frontier AI Safety Research — FAR AI official press release, 2025 ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
Can Go AIs be adversarially robust? – FAR.AI (https://far.ai/research/can-go-ais-be-adversarially-robust) ↩ ↩2 ↩3
-
FAR.AI Robustness Research (https://far.ai/topic/robustness) ↩ ↩2 ↩3
-
McKenzie, I.R., Hollinsworth, O.J., Tseng, T., Davies, X., Casper, S., Tucker, A.D., Kirk, R., Gleave, A. "STACK: Adversarial Attacks on LLM Safeguard Pipelines." arXiv:2506.24068, submitted June 30, 2025; revised February 5, 2026. (https://arxiv.org/abs/2506.24068) ↩ ↩2 ↩3 ↩4
-
Grantmaking | FAR.AI — FAR AI grantmaking program page ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
2023 Alignment Research Updates – FAR.AI (https://far.ai/post/2023-12-far-research-update) ↩ ↩2
-
Layered AI Defenses Have Holes: Vulnerabilities and Key Recommendations — FAR AI blog, companion post to STACK paper ↩
-
Karl Berzins | FAR.AI — FAR AI official profile listing Berzins as "President of FAR.AI" ↩ ↩2
-
Karl Berzins — LinkedIn profile (https://www.linkedin.com/in/karlberzins/), listing title as "Co-founder & President at FAR.AI" ↩
-
What's new at FAR AI — EA Forum, FAR AI, December 4, 2023 ↩ ↩2 ↩3 ↩4
-
Adam Gleave | FAR.AI (https://far.ai/author/adam-gleave) ↩
-
Far Ai Inc - Nonprofit Explorer - ProPublica (https://projects.propublica.org/nonprofits/organizations/920692207) ↩ ↩2 ↩3 ↩4 ↩5
-
FAR.AI Research Scientist — Careers Page — job posting describing organization as having "grown quickly to 40+ staff" ↩ ↩2 ↩3
-
Open Philanthropy grant database: FAR.AI — AI Field Building (2025), August 6, 2025; and FAR.AI — AI Safety Research and Field-Building, September 28, 2025 ↩ ↩2
-
Programs – FAR.AI (https://far.ai/programs) ↩
-
Adam Gleave — Google Scholar (https://scholar.google.com/citations?user=lBunDH0AAAAJ&hl=en) ↩ ↩2 ↩3
-
EU AI Act Newsletter #77: AI Office Tender, May 13, 2025 — total tender value €9,080,000 across all six lots ↩
-
Open Philanthropy Is Now Coefficient Giving — Coefficient Giving official announcement, December 10, 2025; EA Forum post by Alexander Berger, November 18, 2025 ↩
References
Adam Gleave - Google Scholar Loading...
Can Go AIs be adversarially robust?