Longterm Wiki
Updated 2026-03-13HistoryData
Page StatusContent
Edited today3.3k words8 backlinksUpdated every 3 weeksDue in 3 weeks
76QualityGood84.5ImportanceHigh53.5ResearchModerate
Summary

FAR AI is an AI safety research nonprofit founded in July 2022 by Adam Gleave (CEO) and Karl Berzins (Co-founder & President). Based in Berkeley, California, the organization conducts technical research in adversarial robustness, model evaluation, interpretability, and alignment. Notable work includes demonstrating that adversarial policies can defeat superhuman Go AIs and co-authoring the 'Towards Guaranteed Safe AI' framework. FAR AI reported \$24.3M in FY2024 revenue and secured over \$30M in 2025 funding commitments from funders including Coefficient Giving (previously Open Philanthropy), Schmidt Sciences, and the Survival and Flourishing Fund. In early 2026, FAR AI was selected by the European Commission's AI Office to lead CBRN risk research under tender EC-CNECT/2025/OP/0032. The organization also operates FAR.Labs (a Berkeley coworking space with 40+ members) and a \$12M grantmaking program.

Content8/13
LLM summaryScheduleEntityEdit history3Overview
Tables4/ ~13Diagrams0/ ~1Int. links32/ ~26Ext. links16/ ~16Footnotes0/ ~10References27/ ~10Quotes0Accuracy0RatingsN:2.5 R:3 A:2 C:5Backlinks8
Change History3
Auto-improve (standard): FAR AI2 weeks ago

Improved "FAR AI" via standard pipeline (1222.4s). Quality score: 76. Issues resolved: Bare URL in Overview section: 'far.ai' appears as plain text; Frontmatter field 'lastEdited' uses a future date (2026-02-2; Section duplication: The 40+ FAR.Labs member count and 1,000.

1222.4s · $5-8

Surface tacticalValue in /wiki table and score 53 pages3 weeks ago

Added `tacticalValue` to `ExploreItem` interface, `getExploreItems()` mappings, the `/wiki` explore table (new sortable "Tact." column), and the card view sort dropdown. Scored 49 new pages with tactical values (4 were already scored), bringing total to 53.

sonnet-4 · ~30min

Wiki editing system refactoring#1843 weeks ago

Six refactors to the wiki editing pipeline: (1) extracted shared regex patterns to `crux/lib/patterns.ts`, (2) refactored validation in page-improver to use in-process engine calls instead of subprocess spawning, (3) split the 694-line `phases.ts` into 7 individual phase modules under `phases/`, (4) created shared LLM abstraction `crux/lib/llm.ts` unifying duplicated streaming/retry/tool-loop code, (5) added Zod schemas for LLM JSON response validation, (6) decomposed 820-line mermaid validation into `crux/lib/mermaid-checks.ts` (604 lines) + slim orchestrator (281 lines). Follow-up review integrated patterns.ts across 19+ files, fixed dead imports, corrected ToolHandler type, wired mdx-utils.ts to use shared patterns, replaced hardcoded model strings with MODELS constants, replaced `new Anthropic()` with `createLlmClient()`, replaced inline `extractText` implementations with shared `extractText()` from llm.ts, integrated `MARKDOWN_LINK_RE` into link validators, added `objectivityIssues` to the `AnalysisResult` type (removing an unsafe cast in utils.ts), fixed CI failure from eager client creation, and tested the full pipeline by improving 3 wiki pages. After manual review of 3 improved pages, fixed 8 systematic pipeline issues: (1) added content preservation instructions to prevent polish-tier content loss, (2) made auto-grading default after --apply, (3) added polish-tier citation suppression to prevent fabricated citations, (4) added Quick Assessment table requirement for person pages, (5) added required Overview section enforcement, (6) added section deduplication and content repetition checks to review phase, (7) added bare URL→markdown link conversion instruction, (8) extended biographical claim checker to catch publication/co-authorship and citation count claims. Subsequent iterative testing and prompt refinement: ran pipeline on jan-leike, chris-olah, far-ai pages. Discovered and fixed: (a) `<!-- NEEDS CITATION -->` HTML comments break MDX compilation (changed to `{/* NEEDS CITATION */}`), (b) excessive citation markers at polish tier — added instruction to only mark NEW claims (max 3-5 per page), (c) editorial meta-comments cluttering output — added no-meta-comments instruction, (d) thin padding sections — added anti-padding instruction, (e) section deduplication needed stronger emphasis — added merge instruction with common patterns. Final test results: jan-leike 1254→1997 words, chris-olah 1187→1687 words, far-ai 1519→2783 words, miri-era 2678→4338 words; all MDX compile, zero critical issues.

Issues1
Links7 links could use <R> components

FAR AI

Safety Org

FAR AI

FAR AI is an AI safety research nonprofit founded in July 2022 by Adam Gleave (CEO) and Karl Berzins (Co-founder & President). Based in Berkeley, California, the organization conducts technical research in adversarial robustness, model evaluation, interpretability, and alignment. Notable work includes demonstrating that adversarial policies can defeat superhuman Go AIs and co-authoring the 'Towards Guaranteed Safe AI' framework. FAR AI reported \$24.3M in FY2024 revenue and secured over \$30M in 2025 funding commitments from funders including Coefficient Giving (previously Open Philanthropy), Schmidt Sciences, and the Survival and Flourishing Fund. In early 2026, FAR AI was selected by the European Commission's AI Office to lead CBRN risk research under tender EC-CNECT/2025/OP/0032. The organization also operates FAR.Labs (a Berkeley coworking space with 40+ members) and a \$12M grantmaking program.

TypeSafety Org
Websitefar.ai
Related
People
Dan Hendrycks
Safety Agendas
Adversarial RobustnessAI Benchmarking
Concepts
Natural Abstractions
Organizations
METRApollo Research
3.3k words · 8 backlinks
Safety Org

FAR AI

FAR AI is an AI safety research nonprofit founded in July 2022 by Adam Gleave (CEO) and Karl Berzins (Co-founder & President). Based in Berkeley, California, the organization conducts technical research in adversarial robustness, model evaluation, interpretability, and alignment. Notable work includes demonstrating that adversarial policies can defeat superhuman Go AIs and co-authoring the 'Towards Guaranteed Safe AI' framework. FAR AI reported \$24.3M in FY2024 revenue and secured over \$30M in 2025 funding commitments from funders including Coefficient Giving (previously Open Philanthropy), Schmidt Sciences, and the Survival and Flourishing Fund. In early 2026, FAR AI was selected by the European Commission's AI Office to lead CBRN risk research under tender EC-CNECT/2025/OP/0032. The organization also operates FAR.Labs (a Berkeley coworking space with 40+ members) and a \$12M grantmaking program.

TypeSafety Org
Websitefar.ai
Related
People
Dan Hendrycks
Safety Agendas
Adversarial RobustnessAI Benchmarking
Concepts
Natural Abstractions
Organizations
METRApollo Research
3.3k words · 8 backlinks

Overview

FAR AI (far.ai) is an AI safety research nonprofit founded in July 2022 by Adam Gleave (CEO) and Karl Berzins (Co-founder & President).1 Adam Gleave completed his PhD in AI at UC Berkeley, advised by Stuart Russell.2 The organization's mission is to ensure AI systems are trustworthy and beneficial to society.3 FAR AI incorporated in October 2022 as a 501(c)(3) nonprofit (EIN 92-0692207), having initially operated as a fiscally sponsored project.4

FAR AI conducts technical research in areas including adversarial robustness, Interpretability, model evaluation, and alignment, focusing on fundamental AI safety challenges described as too large or resource-intensive for academia.5 Notable results include adversarial policies that achieved a >99% win rate against the superhuman Go AI KataGo when it uses no tree search, and a >77% win rate even when KataGo uses superhuman-level search—policies that are themselves easily beaten by human amateur players, suggesting that high capability does not guarantee robustness in adversarial settings.6 FAR AI co-authored the "Towards Guaranteed Safe AI" framework paper published in May 2024.5 In early 2026, FAR AI was selected by the European Commission's AI Office to lead CBRN risk research under tender EC-CNECT/2025/OP/0032.7

The organization has grown to 40+ total staff as of early 2026, with a technical team of approximately 15 researchers and plans to scale to 30+ researchers.8 Financial details and program structure are described in the sections below.

Key Research Areas

Adversarial Robustness

Research FocusApproachSafety ConnectionPublications
Adversarial Attacks on Go AITraining adversarial policies against KataGoSuperhuman systems remain exploitable by adversarial inputs"Adversarial Policies Beat Superhuman Go AIs" (2023)6
Go AI Defense AnalysisTesting adversarial training, iterated adversarial training, vision transformersNone of three tested defenses withstood adaptive attacks"Can Go AIs be adversarially robust?" (2024)9
LLM Adversarial TrainingAdversarial training vs. scaling for robustnessOrders-of-magnitude efficiency gains over scaling aloneFAR.AI robustness research10
Multi-layer Defense BypassSTACK (STaged AttaCK) method against layered AI defensesIdentifies gaps in defense-in-depth strategies71% attack success rate on ClearHarm dataset11

FAR AI's research in adversarial robustness has produced several empirical results. A 2023 paper, "Adversarial Policies Beat Superhuman Go AIs," demonstrated that adversarial policies achieved a >99% win rate against KataGo when it uses no tree-search, and a >77% win rate even when KataGo uses superhuman-level search.6 The adversarial policies win by inducing blunders in KataGo rather than by playing stronger Go, and are themselves easily beaten by human amateur players.6 A follow-up 2024 paper, "Can Go AIs be adversarially robust?," tested three natural defenses—positional adversarial training, iterated adversarial training, and vision transformer architectures—and found that none could withstand adaptive adversarial attacks.9 FAR AI has also found that adversarial training improves language model robustness orders of magnitude more efficiently than scaling model size alone, and that larger language models are more vulnerable to data poisoning, a result demonstrated across 23 LLMs from 8 model series.10

The STACK (STaged AttaCK) method, documented in a paper co-authored with UK AI Safety Institute researchers (arXiv:2506.24068, submitted June 2025), achieved a 71% attack success rate on the ClearHarm dataset in black-box attacks against multi-layered classifier pipelines, compared to 0% for conventional attacks against the same defenses.11 The paper's authors from FAR AI include Ian R. McKenzie, Oskar J. Hollinsworth, Tom Tseng, and Adam Gleave; collaborating researchers from UK AISI include Xander Davies, Stephen Casper, Aaron D. Tucker, and Robert Kirk.11

Research Programs

ProgramPurposeDetails
FAR.LabsCo-working spaceBerkeley-based AI safety research hub with 40+ active members1
GrantmakingFund external research$12 million from Coefficient Giving (formerly Open Philanthropy) supports academics and independent researchers12
Events & WorkshopsConvene stakeholders1,000+ attendees across 10+ events hosted1
In-house ResearchTechnical safety workRobustness, interpretability, alignment; 30+ research papers published1

The grantmaking program funds external researchers working on AI safety problems, with four initial grants targeting robustness across data poisoning and model stealing, automated alignment testing, weak-to-strong generalization, and alignment security against jailbreaks and finetuning attacks.12 Individual recipient names are not publicly disclosed; grantees are identified through expert nomination rather than public calls for proposals, though FAR AI has indicated plans to launch public RFPs in future cycles.12 FAR.Labs provides physical co-working space and community infrastructure for independent researchers. The events program hosts workshops and convenings, including the inaugural Technical Innovations for AI Policy Conference and international alignment workshops.13

LLM Red-Teaming and Model Evaluation

FAR AI began red-teaming leading language models for frontier labs in Q4 2023, including red-teaming of GPT-4.1 The organization delivers research through peer-reviewed publications, governmental partnerships, and red-teaming engagements.3 As of early 2026, publicly confirmed red-teaming engagements include work with OpenAI and the EU AI Office CBRN evaluation role; testing of STACK variants on production systems such as Claude 4 Opus was described as ongoing research under responsible disclosure protocols.1114

Natural Abstractions Research

FAR AI has expressed theoretical interest in natural abstractions research, which explores whether intelligent systems independently converge on similar conceptual representations of the world. This research direction connects to work by MIRI and other organizations investigating whether shared abstractions between human and AI cognition could provide a foundation for alignment approaches. This area remains at an early theoretical stage within FAR AI's portfolio and has not yet produced peer-reviewed publications attributed to the organization.

Organizational Structure and Operations

Leadership

FAR AI was founded in July 2022 by Adam Gleave and Karl Berzins.1 Gleave serves as Co-founder & CEO; Berzins serves as Co-founder & President.1516 Berzins held the title of COO as of December 2023, with his title transitioning to President thereafter, as reflected on FAR AI's official website.1715 Gleave completed his PhD in artificial intelligence at UC Berkeley under Stuart Russell's supervision, and his research focuses on developing techniques for AI systems to act according to human preferences.18 According to public 990 filings, Gleave's compensation was $229,331 and Berzins's was $182,641 in fiscal year 2024.19

Organizational Structure

FAR AI is incorporated as a 501(c)(3) nonprofit with EIN 92-0692207, tax-exempt since November 2020.19 The organization maintains a policy capping revenue from for-profit AI developers at a maximum of 10% of total annual revenue, and charges market rates to avoid subsidizing private actors.4 FAR AI reported $24.3 million in revenue and $8.6 million in expenses for fiscal year 2024, per its Form 990 filed November 14, 2025.19

Research Team and Staffing

FAR AI's staffing has grown substantially since its founding. As of December 2023, the organization had 12 full-time staff (approximately 11.5 FTEs), including 5 technical staff, a 3-person operations team, and a 1.5 FTE communications team.17 By the time of the $30M+ funding announcement in 2025, the technical research team had grown to approximately 15 researchers, with plans to scale to 30+ technical researchers over the following 12–18 months.8 As of early 2026, FAR AI's job postings describe the organization as having 40+ total staff.20 The organization maintains Operations, Communications, and Technical Staff departments.

FAR.Labs Co-working Space

FAR.Labs is a co-working hub located in downtown Berkeley, opened in March 2023, and now houses 40+ active members working on AI safety problems.17 Coefficient Giving (previously Open Philanthropy) provided $1.7 million over three years specifically to support the FAR.Labs coworking hub.21

Current State & Trajectory

As of early 2026, FAR AI has expanded to 40+ staff with plans to scale the technical team from 15 to 30+ researchers.208 Key recent developments include the $30M+ multi-funder commitment in 2025,8 the launch of a $12M grantmaking program in Q3 2024,22 and selection by the European Commission's AI Office to lead CBRN risk research in early 2026.7

Looking ahead, FAR AI plans to launch public requests for proposals focused on high-impact research areas, broadening access to its grantmaking program beyond nomination-only pathways.1

Strategic Position Analysis

Organizational Comparisons

FAR AI conducts empirical technical AI safety research without developing or deploying AI products of its own.1 Its work emphasizes adversarial robustness of deployed systems — demonstrating, for example, that superhuman Go AIs can be defeated by adversarial policies despite their capabilities.4 FAR AI caps revenue from for-profit AI developers at 10% of total annual revenue to preserve research independence.10

OrganizationFocusOverlap with FAR AIDifferentiation
AnthropicConstitutional AI, frontier model developmentSafety research, red-teamingFAR AI does not develop or deploy models; revenue from for-profits capped at 10%
ARCTheoretical alignment researchAlignment goalsFAR AI uses empirical ML methods (e.g., adversarial training experiments) rather than formal theory
METRModel evaluationSafety assessment, red-teamingFAR AI additionally researches adversarial robustness defenses and value alignment
Academic LabsML researchTechnical methods, publication venuesFAR AI focuses on safety-specific problems described as too large or resource-intensive for academia

Positioning in AI Safety Ecosystem

FAR AI publishes at mainstream ML venues (NeurIPS, ICML, ICLR) to reach audiences beyond specialized safety communities.3 The organization focuses on fundamental AI safety challenges described as too large or resource-intensive for academia alone.3 FAR AI's research has found that even superhuman AI systems can fail against adversarial attacks, and that none of three tested defense strategies could withstand adaptive adversaries.9

FAR AI's position within the Berkeley AI safety ecosystem is reinforced by FAR.Labs and its grantmaking program, which together support researchers across institutional boundaries.12 The organization's adversarial robustness agenda connects near-term safety concerns about deployed systems to longer-term alignment challenges.12

Research Impact and Influence

Academic Reception

FAR AI has published over 30 research papers across robustness, value alignment, and model evaluation.1 Research appears at top-tier venues including NeurIPS, ICML, and ICLR.17 The KataGo adversarial policies paper (2023) has accumulated 99 citations on Google Scholar; the earlier foundational adversarial policies paper published at ICLR 2020—predating FAR AI's founding—has amassed 555 citations, reflecting its influence in the broader field.23 A 2025 multi-agent risks paper has reached 113 citations.23 FAR AI's research has been cited in congressional testimony and mainstream media.3

The 2024 paper "Towards Guaranteed Safe AI," co-authored by 17 contributors including Yoshua Bengio and Stuart Russell, proposed a safety framework using world models, safety specifications, and verifiers.5

Policy Engagement

In early 2026, FAR AI was selected by the European Commission's AI Office to lead Lot 1 (CBRN Risk Modelling and Evaluation) of tender EC-CNECT/2025/OP/0032, titled "Artificial Intelligence Act: Technical Assistance for AI Safety."7 The contract has a three-year duration; the total tender value across all six lots is €9,080,000.24 FAR AI leads a consortium including SecureBio (biological threat assessment) and SaferAI (AI governance and risk modeling), with subcontractors GovAI, Nemesys Insights, and Equistamp.7 FAR AI also participates as a subcontractor on Lot 4 (Harmful Manipulation Risk).7

FAR AI launched the inaugural Technical Innovations for AI Policy Conference on May 31–June 1, 2025, convening over 150 technical experts, researchers, and policymakers in Washington, D.C.4 FAR AI's research on robustness and evaluation is relevant to ongoing AI governance discussions.

Research Questions and Uncertainties

Theoretical Questions

Several theoretical questions shape FAR AI's research direction:

  • Natural Abstractions Validity: Whether intelligent systems independently converge on similar conceptual representations remains an open empirical question. The natural abstractions hypothesis has theoretical appeal but requires extensive empirical validation across diverse AI architectures and training regimes.

  • Robustness-Alignment Connection: The relationship between adversarial robustness and value alignment is not fully understood. While robustness may be necessary for aligned systems, the degree to which robustness research directly contributes to solving alignment problems remains debated within the AI safety community.

  • Scaling Dynamics: Whether current robustness and evaluation approaches will remain relevant as AI systems increase in capability is uncertain. Some safety researchers argue that qualitatively new challenges emerge at higher capability levels that may not be addressed by current methodologies.

Organizational Uncertainties

  • Research Timeline: Academic publication timelines typically span months to years, including peer review, revision, and conference scheduling. Whether this research pace adequately matches the urgency of safety concerns depends on assessments of timelines for Transformative AI development.

  • Scope Evolution: FAR AI's research focus may evolve as the field develops. The organization's emphasis on empirical robustness could shift toward other safety approaches depending on which problems prove most tractable or urgent.

  • Policy Engagement: The extent of FAR AI's involvement in AI governance and policy discussions may expand beyond its current focus on technical research and convening activities.

Field-Wide Debates

DebateFAR AI ApproachAlternative Views
Value of robustness for alignmentRobustness research treated as relevant to safetySome researchers see limited connection to core alignment
Natural abstractions importanceTheoretical interest in the conceptOthers view the hypothesis as speculative without strong evidence
Academic vs. applied researchMaintains academic publication modelSome argue industry-facing applied research is more impactful
Benchmark limitationsBenchmark development as part of research programOthers raise fundamental Goodhart's Law concerns

These debates reflect broader disagreements within the AI safety community about research priorities, timelines, and the relationship between different technical approaches to safety.

Funding & Sustainability

Current Funding Model

FAR AI's funding profile reflects concentration among sources within the effective altruism ecosystem. Coefficient Giving (which rebranded from Coefficient Giving in November 2025)25 is the principal funder; FAR AI's own press materials refer to the funder as "Coefficient Giving (previously Coefficient Giving)."8 Coefficient Giving has provided multiple distinct grants to FAR AI: approximately $28.675 million over three years for research team expansion, a technical internship and fellowship program, and a governance team; $6.65 million over two years for FAR.Futures (events, outreach, and field-building); $2.16 million over three years for general support; and $1.7 million over three years for the FAR.Labs coworking hub.21 A separate $12 million grant from Open Philanthropy (now Coefficient Giving) funds FAR AI's grantmaking program for external researchers.12

Additional funders announced in FAR AI's 2025 funding announcement include Schmidt Sciences, the Survival and Flourishing Fund (SFF), the Center for Security and Emerging Technology (CSET), and the AI Safety Fund (AISF) supported by the Frontier Model Forum.8 FAR AI's job postings reference total funding of over $40 million, suggesting the cumulative figure exceeds the $30M+ figure cited in the formal press release.20 Good Ventures, Coefficient Giving's partner foundation, supports FAR AI as part of its Navigating Transformative AI focus area.23

In fiscal year 2024, FAR AI reported $24.3 million in revenue and $8.6 million in expenses, per its Form 990 filed November 14, 2025.19

Revenue from for-profit AI developers is capped at a maximum of 10% of FAR AI's total annual revenue, and FAR AI commits to disclosing the fraction of revenue derived from such consulting.4 This concentration of philanthropic funding creates exposure to shifts in funder priorities; it also provides multi-year commitments that enable longer-horizon research planning. The $12 million grantmaking allocation is directed toward supporting academics and independent researchers working on critical AI safety problems.1

Criticisms and Responses

Academic Pace Concerns

Criticism: Academic publication processes operate on timelines of months to years, including peer review, revision, and conference scheduling. Critics argue this pace may be too slow given rapid AI capability advances and the urgency of safety concerns.

Response: Proponents of the peer-reviewed publication model argue it ensures research quality and credibility, and that methodology and evaluation frameworks developed through careful research provide lasting value even as specific techniques evolve. Preprint sharing and direct collaboration with AI labs can accelerate impact for time-sensitive findings.

Context: This tension between research rigor and speed affects the broader AI safety field. Different organizations make different tradeoffs between publication quality, speed, and direct industry impact.

Limited Scope Questions

Criticism: Research on adversarial robustness and evaluation may not directly address core alignment challenges like deceptive alignment, goal specification, or value learning. Critics question whether robustness research provides sufficient traction on harder alignment problems. A pointed version of this concern is that demonstrating vulnerabilities in narrow systems — such as the finding that superhuman Go AIs can be beaten by adversarial policies playing cyclic patterns that amateur humans easily defeat1 — illuminates failure modes without providing a clear path to preventing them in more capable systems.4

Response: FAR AI argues robustness is a necessary foundation for aligned systems, noting that "most alignment proposals using helper ML systems will fail if helpers are exploited by main systems."19 FAR AI's portfolio also includes value alignment work that produced more sample-efficient value learning algorithms13 and contributed to the "Towards Guaranteed Safe AI" framework co-authored with researchers including Yoshua Bengio and Stuart Russell.5

Context: The AI safety field contains diverse views on which research directions are most valuable. Some researchers emphasize near-term robustness and evaluation, while others focus on long-term theoretical alignment challenges. FAR AI's own findings that adversarial training improves language model robustness orders of magnitude more efficiently than scaling alone4 are cited as evidence that empirical robustness work can yield generalizable insights.

Natural Abstractions Theory Concerns

Criticism: The natural abstractions hypothesis lacks extensive empirical validation. Critics argue that theoretical frameworks should be grounded in experimental evidence before receiving substantial research attention.

Response: Proponents argue theoretical frameworks can productively guide empirical research programs, and that the multi-year timeline for validation is appropriate given the scope of the hypothesis.

Context: Disagreement about when to invest in theoretical versus empirical work is common in early-stage scientific fields. Different researchers make different judgments about the appropriate balance.

Footnotes

  1. About | FAR.AI (https://far.ai/about) 2 3 4 5 6 7 8 9 10 11

  2. Adam Gleave - AI2050 (https://ai2050.schmidtsciences.org/fellow/adam-gleave)

  3. Research Overview – FAR.AI (https://far.ai/research) 2 3 4 5

  4. Transparency | FAR.AI (https://far.ai/about/transparency) 2 3 4 5 6 7

  5. Towards Guaranteed Safe AI | FAR.AI (https://far.ai/research/towards-guaranteed-safe-ai-a-framework-for-ensuring-robust-and-reliable-ai-systems) 2 3 4

  6. Adversarial Policies Beat Superhuman Go AIs | FAR.AI (https://far.ai/research/adversarial-policies-beat-superhuman-go-ais) 2 3 4

  7. FAR.AI Selected to Lead EU AI Act CBRN Risk Consortium — FAR AI official announcement, February 2026 2 3 4 5

  8. FAR.AI Secures Over $30 Million in Multi-Funder Support to Scale Frontier AI Safety Research — FAR AI official press release, 2025 2 3 4 5 6

  9. Can Go AIs be adversarially robust? – FAR.AI (https://far.ai/research/can-go-ais-be-adversarially-robust) 2 3

  10. FAR.AI Robustness Research (https://far.ai/topic/robustness) 2 3

  11. McKenzie, I.R., Hollinsworth, O.J., Tseng, T., Davies, X., Casper, S., Tucker, A.D., Kirk, R., Gleave, A. "STACK: Adversarial Attacks on LLM Safeguard Pipelines." arXiv:2506.24068, submitted June 30, 2025; revised February 5, 2026. (https://arxiv.org/abs/2506.24068) 2 3 4

  12. Grantmaking | FAR.AI — FAR AI grantmaking program page 2 3 4 5 6

  13. 2023 Alignment Research Updates – FAR.AI (https://far.ai/post/2023-12-far-research-update) 2

  14. Layered AI Defenses Have Holes: Vulnerabilities and Key Recommendations — FAR AI blog, companion post to STACK paper

  15. Karl Berzins | FAR.AI — FAR AI official profile listing Berzins as "President of FAR.AI" 2

  16. Karl Berzins — LinkedIn profile (https://www.linkedin.com/in/karlberzins/), listing title as "Co-founder & President at FAR.AI"

  17. What's new at FAR AI — EA Forum, FAR AI, December 4, 2023 2 3 4

  18. Adam Gleave | FAR.AI (https://far.ai/author/adam-gleave)

  19. Far Ai Inc - Nonprofit Explorer - ProPublica (https://projects.propublica.org/nonprofits/organizations/920692207) 2 3 4 5

  20. FAR.AI Research Scientist — Careers Page — job posting describing organization as having "grown quickly to 40+ staff" 2 3

  21. Open Philanthropy grant database: FAR.AI — AI Field Building (2025), August 6, 2025; and FAR.AI — AI Safety Research and Field-Building, September 28, 2025 2

  22. Programs – FAR.AI (https://far.ai/programs)

  23. Adam Gleave — Google Scholar (https://scholar.google.com/citations?user=lBunDH0AAAAJ&hl=en) 2 3

  24. EU AI Act Newsletter #77: AI Office Tender, May 13, 2025 — total tender value €9,080,000 across all six lots

  25. Open Philanthropy Is Now Coefficient Giving — Coefficient Giving official announcement, December 10, 2025; EA Forum post by Alexander Berger, November 18, 2025

References

1Hendrycks et al.arXiv·Dan Hendrycks et al.·2020·Paper
★★★☆☆
2YouTubeyoutube.com·Talk

‪Adam Gleave‬ - ‪Google Scholar‬ Loading...

7ICLR 2021arXiv·Dan Hendrycks et al.·2020·Paper
★★★☆☆
10PodcastsFuture of Life Institute
★★★☆☆
11ICLR 2017arXiv·Takeru Miyato, Andrew M. Dai & Ian Goodfellow·2016·Paper
★★★☆☆
★★★☆☆
18MATHarXiv·Dan Hendrycks et al.·2021·Paper
★★★☆☆

Can Go AIs be adversarially robust?

23Google ScholarGoogle Scholar
★★★★☆

Structured Data

1 factView full profile →

All Facts

General
PropertyValueAs OfSource
Websitehttps://far.ai

Related Pages

Top Related Pages

Safety Research

Interpretability

Approaches

Constitutional AIAdversarial Training

Analysis

Anthropic IPO

Policy

US Executive Order on Safe, Secure, and Trustworthy AI

Other

Dan HendrycksJaan Tallinn

Organizations

OpenAIAnthropicFrontier Model ForumSecureBioMATS ML Alignment Theory Scholars programAlignment Research Center

Risks

Deceptive AlignmentAI Distributional Shift

Concepts

AI Benchmarking