Back
Adam Gleave | FAR.AI
webfar.ai·far.ai/author/adam-gleave
Data Status
Not fetched
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| FAR AI | Organization | 76.0 |
Cached Content Preview
HTTP 200Fetched Feb 23, 202612 KB
Adam Gleave | FAR.AI
We updated our website and would love your feedback!
Events
Events
Programs
Programs
Blog
About
About
Careers Donate
About
/ People
Adam Gleave
Co-founder & CEO
FAR.AI
Adam Gleave is the CEO of FAR.AI. He completed his PhD in artificial intelligence (AI) at UC Berkeley, advised by Stuart Russell . His goal is to develop techniques necessary for advanced automated systems to verifiably act according to human preferences, even in situations unanticipated by their designer. He is particularly interested in improving methods for value learning, and robustness of deep RL. For more information, visit his website .
NEWs & publications
NEWs & publications
Revisiting Frontier LLMs’ Attempts to Persuade on Extreme Topics: GPT and Claude Improved, Gemini Worsened
February 11, 2026
revisiting-attempts-to-persuade
Revisiting Frontier LLMs’ Attempts to Persuade on Extreme Topics: GPT and Claude Improved, Gemini Worsened
revisiting-attempts-to-persuade
AI in 2025: Faster Progress, Harder Problems
December 16, 2025
san-diego-2025-opening-remarks
Frontier LLMs Attempt to Persuade into Harmful Topics
August 21, 2025
attempt-to-persuade-eval
It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics
its-the-thought-that-counts-evaluating-the-attempts-of-frontier-llms-to-persuade-on-harmful-topics
A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs
July 31, 2025
safety-gap-toolkit
The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models
the-safety-gap-toolkit-evaluating-hidden-dangers-of-open-source-models
Layered AI Defenses Have Holes: Vulnerabilities and Key Recommendations
July 2, 2025
defense-in-depth
STACK: Adversarial Attacks on LLM Safeguard Pipelines
stack-adversarial-attacks-on-llm-safeguard-pipelines
ClearHarm: A more challenging jailbreak dataset
June 23, 2025
clearharm-a-more-challenging-jailbreak-dataset
ClearHarm: A more challenging jailbreak dataset
clearharm-a-more-challenging-jailbreak-dataset
Avoiding AI Deception: Lie Detectors can either Induce Honesty or Evasion
June 4, 2025
avoiding-ai-deception
Preference Learning with Lie Detectors can Induce Honesty or Evasion
preference-learning-with-lie-detectors-can-induce-honesty-or-evasion
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
February 4, 2025
... (truncated, 12 KB total)Resource ID:
ca68437469b0fe97 | Stable ID: OTY2ZDE1YW