Longterm Wiki
Back

Adam Gleave | FAR.AI

web

Data Status

Not fetched

Cited by 1 page

PageTypeQuality
FAR AIOrganization76.0

Cached Content Preview

HTTP 200Fetched Feb 23, 202612 KB
Adam Gleave | FAR.AI 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 

 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 We updated our website and would love your feedback! 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Events 
 
 
 
 
 
 Events 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Programs 
 
 
 
 
 
 Programs 
 
 
 
 
 
 
 
 
 Blog 
 
 
 
 
 
 About 
 
 
 
 
 
 About 
 
 
 
 
 
 
 
 
 
 Careers Donate 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 About 
 / People 
 
 
 
 
 
 Adam Gleave

 
 
 Co-founder & CEO

 FAR.AI

 
 
 
 
 
 
 
 Adam Gleave is the CEO of FAR.AI. He completed his PhD in artificial intelligence (AI) at UC Berkeley, advised by Stuart Russell . His goal is to develop techniques necessary for advanced automated systems to verifiably act according to human preferences, even in situations unanticipated by their designer. He is particularly interested in improving methods for value learning, and robustness of deep RL. For more information, visit his website .

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 NEWs & publications

 
 
 
 NEWs & publications

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Revisiting Frontier LLMs’ Attempts to Persuade on Extreme Topics: GPT and Claude Improved, Gemini Worsened 
 February 11, 2026 
 revisiting-attempts-to-persuade 
 
 Revisiting Frontier LLMs’ Attempts to Persuade on Extreme Topics: GPT and Claude Improved, Gemini Worsened 
 revisiting-attempts-to-persuade 
 
 
 
 AI in 2025: Faster Progress, Harder Problems 
 December 16, 2025 
 san-diego-2025-opening-remarks 
 
 
 
 
 
 
 Frontier LLMs Attempt to Persuade into Harmful Topics 
 August 21, 2025 
 attempt-to-persuade-eval 
 
 It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics 
 its-the-thought-that-counts-evaluating-the-attempts-of-frontier-llms-to-persuade-on-harmful-topics 
 
 
 
 ​​A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs 
 July 31, 2025 
 safety-gap-toolkit 
 
 The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models 
 the-safety-gap-toolkit-evaluating-hidden-dangers-of-open-source-models 
 
 
 
 Layered AI Defenses Have Holes: Vulnerabilities and Key Recommendations 
 July 2, 2025 
 defense-in-depth 
 
 STACK: Adversarial Attacks on LLM Safeguard Pipelines 
 stack-adversarial-attacks-on-llm-safeguard-pipelines 
 
 
 
 ClearHarm: A more challenging jailbreak dataset 
 June 23, 2025 
 clearharm-a-more-challenging-jailbreak-dataset 
 
 ClearHarm: A more challenging jailbreak dataset 
 clearharm-a-more-challenging-jailbreak-dataset 
 
 
 
 Avoiding AI Deception: Lie Detectors can either Induce Honesty or Evasion 
 June 4, 2025 
 avoiding-ai-deception 
 
 Preference Learning with Lie Detectors can Induce Honesty or Evasion 
 preference-learning-with-lie-detectors-can-induce-honesty-or-evasion 
 
 
 
 Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google 
 February 4, 2025 
 

... (truncated, 12 KB total)
Resource ID: ca68437469b0fe97 | Stable ID: OTY2ZDE1YW