Bloom: Automated Behavioral Evaluations

web

Anthropic Alignment·alignment.anthropic.com/2025/bloom-auto-evals/

Credibility Rating

4/5

High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Anthropic Alignment

Published by Anthropic's alignment team, Bloom describes infrastructure for automated behavioral evaluations relevant to researchers building scalable safety testing pipelines or studying how frontier labs operationalize safety assessments.

Metadata

Importance: 62/100blog postprimary source

Summary

Bloom is Anthropic's system for automated behavioral evaluations of AI models, designed to scalably assess safety-relevant behaviors without requiring human red-teamers for every evaluation. It enables systematic testing of model behaviors across a wide range of scenarios, supporting both capability assessment and safety evaluation at scale.

Key Points

•Introduces automated pipelines for evaluating AI behavioral safety properties, reducing reliance on manual red-teaming efforts
•Enables scalable, reproducible assessments of safety-relevant behaviors across diverse scenarios and model versions
•Supports Anthropic's model evaluation infrastructure for tracking behavioral changes during training and deployment
•Integrates with broader alignment research by providing systematic benchmarks for measuring policy compliance and harm avoidance
•Represents a step toward automating parts of the AI safety evaluation process that traditionally required human judgment

Cited by 4 pages

Page	Type	Quality
Epistemic Virtue Evals	Approach	45.0
Evals-Based Deployment Gates	Approach	66.0
AI Evaluations	Research Area	72.0
Scalable Eval Approaches	Approach	65.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202665 KB

Bloom: an open source tool for automated behavioral evaluations 
 
 
 
 
 
 

 

 

 

 
 Alignment Science Blog 
 
 
 Bloom: an open source tool for automated behavioral evaluations

 
 
 
 
 Isha Gupta 
 December 19, 2025 
 
 Kai Fronsdal, Abhay Sheshadri, Jonathan Michala, Jacqueline Tay 
 

 

 Rowan Wang, Samuel R. Bowman, Sara Price 

 
 
 

 
 tl;dr 
 We are releasing Bloom, an agentic framework for developing behavioral
 evaluations. Bloom's evaluations are reproducible and targeted: unlike open-ended auditing, Bloom
 takes a researcher-specified behavior and quantifies its frequency and severity across automatically
 generated scenarios. Bloom's evaluations correlate strongly with our hand-labelled judgments and
 reliably separate baseline models from intentionally misaligned ones. As examples, we also release
 benchmark results for four alignment relevant behaviors on 16 models. Bloom is available at
 github.com/safety-research/bloom . 

 
 

 

 Introduction

 Frontier models exhibit various types of misalignment, for example in-context
 scheming (Meinke et al, 2024), agentic misalignment (Lynch et al, 2025), and sycophancy (Sharma et al,
 2023). Although researchers are developing mitigations for known issues ( Sonnet
 4.5 System Card , 2025) , new  forms of misalignment will likely
 emerge as models gain capabilities and are deployed in more complex environments. High-quality
 evaluations remain essential for assessing these behaviors, but they require large amounts of researcher
 time and are limited in quantity (see Table 1 ). These bespoke
 evaluations also risk losing validity through training set contamination or rapidly evolving
 capabilities (Kwa et al 2025). 

 

 Advancing model capabilities now make it possible to automate evaluation
 development. Bloom is an agentic framework for generating
 targeted evaluations for researcher-specified behavioral traits. We built Bloom to be accessible and highly configurable, serving as a reliable
 evaluation generation framework for diverse research applications. With Bloom, researchers can skip the
 evaluation pipeline engineering and go straight to measuring the propensities they are interested in
 with a trusted, effective scaffold. 

 

 We recently released Petri ,
 an automated auditor that explores the overall behavioral profiles of different models and surfaces new
 misaligned behaviors. Bloom serves a complementary but separate purpose: generating in-depth evaluation
 suites for specific behaviors and quantifying their severity and frequency across automatically
 generated scenarios. Alongside the tool, we are releasing benchmarks for four behaviors—delusional
 sycophancy, instructed long-horizon sabotage, self-preservation and self-preferential bias—across 16
 frontier models. These only took a few days to conceptualize, refine and generate with Bloom. 
 

 

 
 We present comparative plots from four
 Bloom-generated evaluations—delusional sycophancy, instructed long-horizon s

... (truncated, 65 KB total)

Resource ID: 7fa7d4cb797a5edd | Stable ID: sid_Ir6hmsvhTe