Back
DeepMind Safety Research – Medium Blog
blogCredibility Rating
2/5
Mixed(2)Mixed quality. Some useful content but inconsistent editorial standards. Claims should be verified.
Rating inherited from publication venue: Medium
This is the public Medium blog for DeepMind's safety research team, offering accessible write-ups of their technical safety work; useful for tracking ongoing research directions from one of the leading AI safety labs.
Metadata
Importance: 62/100blog posthomepage
Summary
The official Medium blog of DeepMind's safety research team, publishing accessible summaries and extended abstracts of their technical AI safety work. Topics covered include sycophancy, jailbreaks, AI scheming, and technical AGI safety approaches. It serves as a public-facing outlet for DeepMind researchers to communicate safety findings to a broad audience.
Key Points
- •Covers consistency training as a potential mitigation for sycophancy and jailbreaks in large language models
- •Includes work on evaluating and monitoring for AI scheming behaviors
- •Publishes extended abstracts and summaries of DeepMind's technical AGI safety and security papers
- •Features contributions from prominent researchers including Rohin Shah, Victoria Krakovna, and Alex Turner
- •Acts as a bridge between technical research and broader AI safety community communication
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Technical AI Safety Research | Crux | 66.0 |
Cached Content Preview
HTTP 200Fetched Apr 9, 20263 KB
DeepMind Safety Research – Medium Mar APR May 01 2025 2026 2027 success fail About this capture COLLECTED BY Collection: Save Page Now Outlinks TIMESTAMPS The Wayback Machine - http://web.archive.org/web/20260401103022/https://deepmindsafetyresearch.medium.com/ Sitemap Open in app Sign up Sign in Medium Logo Get app Write Search Sign up Sign in DeepMind Safety Research 3.4K followers Home About Predicting When RL Training Breaks Chain-of-Thought Monitorability Why RL training teaches models to hide their reasoning, and a conceptual framework to predict when it happens. Just now Just now Consistency Training Could Help Limit Sycophancy and Jailbreaks Authors: Alex Irpan* and Alex Turner*, Mark Kurzeja, David Elson, and Rohin Shah Nov 3, 2025 A response icon 1 Nov 3, 2025 A response icon 1 Evaluating and monitoring for AI scheming By Victoria Krakovna, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, and Rohin Shah Jul 8, 2025 Jul 8, 2025 An Approach to Technical AGI Safety and Security We have written a paper on our approach to technical AGI safety and security. This post is a copy of the extended abstract, which… Apr 8, 2025 Apr 8, 2025 Negative Results for Sparse Autoencoders On Downstream Tasks and Deprioritising SAE Research… Lewis Smith*, Sen Rajamanoharan*, Arthur Conmy, Callum McDougall, Janos Kramar, Tom Lieberum, Rohin Shah, Neel Nanda Mar 26, 2025 Mar 26, 2025 Introducing our short course on AGI safety We are excited to release a short course on AGI safety for students, researchers and professionals interested in this topic. The course… Feb 14, 2025 A response icon 2 Feb 14, 2025 A response icon 2 Steering Gemini using BIDPO vectors By Alex Turner and Mark Kurzeja Jan 31, 2025 A response icon 1 Jan 31, 2025 A response icon 1 MONA: A method for addressing multi-step reward hacking MONA enhances safety when we train an AI system to perform some task that takes multiple steps. Training an AI with MONA reduces its … Jan 23, 2025 Jan 23, 2025 Human-AI Complementarity: A Goal for Amplified Oversight How do we ensure humans can continue to oversee increasingly powerful AI systems? We argue that achieving human-AI complementarity is key. Dec 23, 2024 Dec 23, 2024 AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work By Rohin Shah, Seb Farquhar, and Anca Dragan Oct 18, 2024 Oct 18, 2024 DeepMind Safety Research 3.4K followers We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.google Help Status About Careers Press Blog Privacy Rules Terms Text to speech
Resource ID:
5b8be7f6a2aa7067 | Stable ID: sid_MVQckxoExZ