Skip to content
Longterm Wiki
Back

Adam Gleave | FAR.AI

web

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: FAR AI

Author index page for Adam Gleave at FAR.AI; useful for finding his specific papers on adversarial policies and reward modeling rather than as a standalone resource.

Metadata

Importance: 30/100homepage

Summary

Author page for Adam Gleave at FAR.AI (Foundational Research for AI Safety), listing his published research and contributions to AI safety. Gleave is a prominent AI safety researcher known for work on adversarial policies, reward modeling, and scalable oversight.

Key Points

  • Adam Gleave is a key researcher at FAR.AI focused on technical AI safety problems
  • His work spans adversarial robustness, reward learning, and evaluation of AI systems
  • FAR.AI is an independent AI safety research organization producing technical alignment research
  • This page serves as an index to his published papers and blog posts on AI safety topics

Cited by 1 page

PageTypeQuality
FAR AIOrganization76.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202613 KB
Adam Gleave | FAR.AI 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 

 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 We updated our website and would love your feedback! 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Events 
 
 
 
 
 
 Events 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Programs 
 
 
 
 
 
 Programs 
 
 
 
 
 
 
 
 
 Blog 
 
 
 
 
 
 About 
 
 
 
 
 
 About 
 
 
 
 
 
 
 
 
 
 Careers Donate 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 About 
 / People 
 
 
 
 
 
 Adam Gleave

 
 
 Co-founder & CEO

 FAR.AI

 
 
 
 
 
 
 
 Adam Gleave is the CEO of FAR.AI. He completed his PhD in artificial intelligence (AI) at UC Berkeley, advised by Stuart Russell . His goal is to develop techniques necessary for advanced automated systems to verifiably act according to human preferences, even in situations unanticipated by their designer. He is particularly interested in improving methods for value learning, and robustness of deep RL. For more information, visit his website .

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 NEWs & publications

 
 
 
 NEWs & publications

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution 
 February 19, 2026 
 concept-data-attribution-02-2026 
 
 Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution 
 concept-influence-leveraging-interpretability-to-improve-performance-and-efficiency-in-training-data-attribution 
 
 
 
 Revisiting Frontier LLMs’ Attempts to Persuade on Extreme Topics: GPT and Claude Improved, Gemini Worsened 
 February 11, 2026 
 revisiting-attempts-to-persuade 
 
 Revisiting Frontier LLMs’ Attempts to Persuade on Extreme Topics: GPT and Claude Improved, Gemini Worsened 
 revisiting-attempts-to-persuade 
 
 
 
 AI in 2025: Faster Progress, Harder Problems 
 December 16, 2025 
 san-diego-2025-opening-remarks 
 
 
 
 
 
 
 Frontier LLMs Attempt to Persuade into Harmful Topics 
 August 21, 2025 
 attempt-to-persuade-eval 
 
 It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics 
 its-the-thought-that-counts-evaluating-the-attempts-of-frontier-llms-to-persuade-on-harmful-topics 
 
 
 
 ​​A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs 
 July 31, 2025 
 safety-gap-toolkit 
 
 The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models 
 the-safety-gap-toolkit-evaluating-hidden-dangers-of-open-source-models 
 
 
 
 Layered AI Defenses Have Holes: Vulnerabilities and Key Recommendations 
 July 2, 2025 
 defense-in-depth 
 
 STACK: Adversarial Attacks on LLM Safeguard Pipelines 
 stack-adversarial-attacks-on-llm-safeguard-pipelines 
 
 
 
 ClearHarm: A more challenging jailbreak dataset 
 June 23, 2025 
 clearharm-a-more-challenging-jailbreak-dataset 
 
 ClearHarm: A more challenging jailbreak dataset 
 clearharm-a-more-challenging-jailbreak-d

... (truncated, 13 KB total)
Resource ID: ca68437469b0fe97 | Stable ID: sid_l97GvJbaSr