Back
Adam Gleave | FAR.AI
webCredibility Rating
4/5
High(4)High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: FAR AI
Author index page for Adam Gleave at FAR.AI; useful for finding his specific papers on adversarial policies and reward modeling rather than as a standalone resource.
Metadata
Importance: 30/100homepage
Summary
Author page for Adam Gleave at FAR.AI (Foundational Research for AI Safety), listing his published research and contributions to AI safety. Gleave is a prominent AI safety researcher known for work on adversarial policies, reward modeling, and scalable oversight.
Key Points
- •Adam Gleave is a key researcher at FAR.AI focused on technical AI safety problems
- •His work spans adversarial robustness, reward learning, and evaluation of AI systems
- •FAR.AI is an independent AI safety research organization producing technical alignment research
- •This page serves as an index to his published papers and blog posts on AI safety topics
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| FAR AI | Organization | 76.0 |
Cached Content Preview
HTTP 200Fetched Apr 9, 202613 KB
Adam Gleave | FAR.AI
We updated our website and would love your feedback!
Events
Events
Programs
Programs
Blog
About
About
Careers Donate
About
/ People
Adam Gleave
Co-founder & CEO
FAR.AI
Adam Gleave is the CEO of FAR.AI. He completed his PhD in artificial intelligence (AI) at UC Berkeley, advised by Stuart Russell . His goal is to develop techniques necessary for advanced automated systems to verifiably act according to human preferences, even in situations unanticipated by their designer. He is particularly interested in improving methods for value learning, and robustness of deep RL. For more information, visit his website .
NEWs & publications
NEWs & publications
Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution
February 19, 2026
concept-data-attribution-02-2026
Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution
concept-influence-leveraging-interpretability-to-improve-performance-and-efficiency-in-training-data-attribution
Revisiting Frontier LLMs’ Attempts to Persuade on Extreme Topics: GPT and Claude Improved, Gemini Worsened
February 11, 2026
revisiting-attempts-to-persuade
Revisiting Frontier LLMs’ Attempts to Persuade on Extreme Topics: GPT and Claude Improved, Gemini Worsened
revisiting-attempts-to-persuade
AI in 2025: Faster Progress, Harder Problems
December 16, 2025
san-diego-2025-opening-remarks
Frontier LLMs Attempt to Persuade into Harmful Topics
August 21, 2025
attempt-to-persuade-eval
It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics
its-the-thought-that-counts-evaluating-the-attempts-of-frontier-llms-to-persuade-on-harmful-topics
A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs
July 31, 2025
safety-gap-toolkit
The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models
the-safety-gap-toolkit-evaluating-hidden-dangers-of-open-source-models
Layered AI Defenses Have Holes: Vulnerabilities and Key Recommendations
July 2, 2025
defense-in-depth
STACK: Adversarial Attacks on LLM Safeguard Pipelines
stack-adversarial-attacks-on-llm-safeguard-pipelines
ClearHarm: A more challenging jailbreak dataset
June 23, 2025
clearharm-a-more-challenging-jailbreak-dataset
ClearHarm: A more challenging jailbreak dataset
clearharm-a-more-challenging-jailbreak-d
... (truncated, 13 KB total)Resource ID:
ca68437469b0fe97 | Stable ID: sid_l97GvJbaSr