UC Berkeley — Study on Frontier Model Behavior

$500K

Funder

Coefficient Giving wiki

Recipient

University of California, Berkeley

Program

AI Safety Grantmaking

Date

Jun 2025

Data source

Coefficient Giving

Source

coefficientgiving.org↗

Notes

[Navigating Transformative AI] Open Philanthropy recommended a grant of $499,597 over three years to UC Berkeley to support a study on the behavior of frontier AI models. This work will be led by Professor Emma Pierson, expanding on her previous paper "Sparse Autoencoders for Hypothesis Generation”. In her original paper, Pierson used sparse autoencoders to analyze text data (like Yelp restaurant review scores) and identify text features that predict outcomes (like how many stars a user awards a restaurant). She then used an LLM to describe those text features as testable, natural-language hypotheses (like “words associated with quick service lead to higher review scores”). This grant will enable Professor Pierson to investigate whether the HypotheSAEs technique can be used to identify properties in the text of LLM prompts that cause AIs to generate harmful responses. The grant will also allow for the exploration of technical upgrades to the HypotheSAEs pipeline, like using Matryoshka SAEs or transcoders. This falls within our focus area of potential risks from advanced artificial intelligence.

Other Grants by Coefficient Giving

2625

Grant	Recipient	Amount	Date
Janaagraha — Air Quality Grants Assessment		$195K	Dec 2024
Futurewise — Housing Advocacy in Washington		$450K	Apr 2023
Exscientia — Agonists for Interferon Lambda		$2.3M	Sep 2023
Kurzgesagt — Short-form Video Content		$3M	Mar 2022
Kurzgesagt — Video Production (2023)		$1.7M	May 2023
Kurzgesagt — Video Creation and Translation		$2.6M	Dec 2021
Lightcone Infrastructure – General Support	Lighthaven (Event Venue)	$4.5M	Sep 2022
Lightcone Infrastructure — General Support (2023)	Lighthaven (Event Venue)	$3M	Oct 2023
Conjecture — Cybersecurity Bootcamp	Conjecture	$223K	Jun 2025
Conjecture — AI Safety Technical Program	Conjecture	$224K	May 2023

Showing 10 of 2625 grants

Other Grants to University of California, Berkeley

Grant	Recipient	Amount	Date
Grant to "support a project to develop interactive AI algorithms for alignment that can uncover the causal features in human reward systems, and thereby help AI systems learn underlying human values that generalize to new situations."	University of California, Berkeley	$800K	May 2022
Grant to "support a project to study how large language models integrated with offline reinforcement learning pose a risk of machine deception and persuasion."	University of California, Berkeley	$600K	Jun 2022
14-week salary for Aris Richardson, EA Berkeley	University of California, Berkeley	$8.6K	Jul 2022
UC Berkeley’s EA group & new AI Safety group 22-23 funding for food, marketing, & lead student organizer salary	University of California, Berkeley	$39K	Oct 2022
14-week salary for Aris Richardson, EA Berkeley	University of California, Berkeley	$8.6K	Jul 2022
UC Berkeley’s EA group & new AI Safety group 22-23 funding for food, marketing, & lead student organizer salary	University of California, Berkeley	$39K	Oct 2022

← Back to Coefficient Giving All grants