Longterm Wiki

UC Berkeley — Study on Frontier Model Behavior

$500K
Funder
Recipient
Uc Berkeley
Date
Jun 2025
Source
Notes

[Navigating Transformative AI] Open Philanthropy recommended a grant of $499,597 over three years to UC Berkeley to support a study on the behavior of frontier AI models. This work will be led by Professor Emma Pierson, expanding on her previous paper "Sparse Autoencoders for Hypothesis Generation”. In her original paper, Pierson used sparse autoencoders to analyze text data (like Yelp restaurant review scores) and identify text features that predict outcomes (like how many stars a user awards a restaurant). She then used an LLM to describe those text features as testable, natural-language hypotheses (like “words associated with quick service lead to higher review scores”).  This grant will enable Professor Pierson to investigate whether the HypotheSAEs technique can be used to identify properties in the text of LLM prompts that cause AIs to generate harmful responses. The grant will also allow for the exploration of technical upgrades to the HypotheSAEs pipeline, like using Matryoshka SAEs or transcoders. This falls within our focus area of potential risks from advanced artificial intelligence.

Other Grants by Coefficient Giving

2626
Showing 10 of 2626 grants
UC Berkeley — Study on Frontier Model Behavior | Grants | Longterm Wiki