Skip to content
Longterm Wiki
Back

DeepMind Safety Research – Medium Blog

blog

Credibility Rating

2/5
Mixed(2)

Mixed quality. Some useful content but inconsistent editorial standards. Claims should be verified.

Rating inherited from publication venue: Medium

This is the public Medium blog for DeepMind's safety research team, offering accessible write-ups of their technical safety work; useful for tracking ongoing research directions from one of the leading AI safety labs.

Metadata

Importance: 62/100blog posthomepage

Summary

The official Medium blog of DeepMind's safety research team, publishing accessible summaries and extended abstracts of their technical AI safety work. Topics covered include sycophancy, jailbreaks, AI scheming, and technical AGI safety approaches. It serves as a public-facing outlet for DeepMind researchers to communicate safety findings to a broad audience.

Key Points

  • Covers consistency training as a potential mitigation for sycophancy and jailbreaks in large language models
  • Includes work on evaluating and monitoring for AI scheming behaviors
  • Publishes extended abstracts and summaries of DeepMind's technical AGI safety and security papers
  • Features contributions from prominent researchers including Rohin Shah, Victoria Krakovna, and Alex Turner
  • Acts as a bridge between technical research and broader AI safety community communication

Cited by 1 page

PageTypeQuality
Technical AI Safety ResearchCrux66.0

Cached Content Preview

HTTP 200Fetched Apr 9, 20263 KB
DeepMind Safety Research – Medium

 

 
 
 
 

 Mar
 APR
 May
 

 
 

 
 01
 
 

 
 

 2025
 2026
 2027
 

 
 
 

 

 

 
 
success

 
fail

 
 
 
 
 
 
 
 
 
 
 

 

 
 
 
 
 
 
 
 
 

 

 About this capture
 

 

 

 

 

 

 
COLLECTED BY

 

 

 
 
Collection: Save Page Now Outlinks

 

 

 

 

 
TIMESTAMPS

 

 

 

 

 

 

The Wayback Machine - http://web.archive.org/web/20260401103022/https://deepmindsafetyresearch.medium.com/

 

Sitemap

Open in app

Sign up

Sign in

Medium Logo

Get app

Write

Search

Sign up

Sign in

DeepMind Safety Research

3.4K followers

Home

About

Predicting When RL Training Breaks Chain-of-Thought Monitorability

Why RL training teaches models to hide their reasoning, and a conceptual framework to predict when it happens.

Just now

Just now

Consistency Training Could Help Limit Sycophancy and Jailbreaks

Authors: Alex Irpan* and Alex Turner*, Mark Kurzeja, David Elson, and Rohin Shah

Nov 3, 2025

A response icon

1

Nov 3, 2025

A response icon

1

Evaluating and monitoring for AI scheming

By Victoria Krakovna, Scott Emmons, Erik Jenner, Mary Phuong, Lewis Ho, and Rohin Shah

Jul 8, 2025

Jul 8, 2025

An Approach to Technical AGI Safety and Security

We have written a paper on our approach to technical AGI safety and security. This post is a copy of the extended abstract, which…

Apr 8, 2025

Apr 8, 2025

Negative Results for Sparse Autoencoders On Downstream Tasks and Deprioritising SAE Research…

Lewis Smith*, Sen Rajamanoharan*, Arthur Conmy, Callum McDougall, Janos Kramar, Tom Lieberum, Rohin Shah, Neel Nanda

Mar 26, 2025

Mar 26, 2025

Introducing our short course on AGI safety

We are excited to release a short course on AGI safety for students, researchers and professionals interested in this topic. The course…

Feb 14, 2025

A response icon

2

Feb 14, 2025

A response icon

2

Steering Gemini using BIDPO vectors

By Alex Turner and Mark Kurzeja

Jan 31, 2025

A response icon

1

Jan 31, 2025

A response icon

1

MONA: A method for addressing multi-step reward hacking

MONA enhances safety when we train an AI system to perform some task that takes multiple steps. Training an AI with MONA reduces its …

Jan 23, 2025

Jan 23, 2025

Human-AI Complementarity: A Goal for Amplified Oversight

How do we ensure humans can continue to oversee increasingly powerful AI systems? We argue that achieving human-AI complementarity is key.

Dec 23, 2024

Dec 23, 2024

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work

By Rohin Shah, Seb Farquhar, and Anca Dragan

Oct 18, 2024

Oct 18, 2024

DeepMind Safety Research

3.4K followers

We research and build safe AI systems that learn how to solve problems and advance scientific discovery for all. Explore our work: deepmind.google

Help

Status

About

Careers

Press

Blog

Privacy

Rules

Terms

Text to speech
Resource ID: 5b8be7f6a2aa7067 | Stable ID: sid_MVQckxoExZ