Longterm Wiki
Back

Grantmaking

web

Data Status

Not fetched

Cited by 1 page

PageTypeQuality
FAR AIOrganization76.0

Cached Content Preview

HTTP 200Fetched Feb 22, 20264 KB
Grantmaking 
 
 
 
 
 
 
 
 
 
 
 
 

 
 

 
 

 
 
 
 
 
 
 
 
 
 
 
 We updated our website and would love your feedback! 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Events 
 
 
 
 
 
 Events 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Programs 
 
 
 
 
 
 Programs 
 
 
 
 
 
 
 
 
 Blog 
 
 
 
 
 
 About 
 
 
 
 
 
 About 
 
 
 
 
 
 
 
 
 
 Careers Donate 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Grantmaking

 
 Funding groundbreaking research in AI safety

 
 Sign up to our newsletter for updates 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 FAR.AI Grant Program

 FAR.AI supports academics and independent researchers in developing innovative solutions to critical AI risks through our targeted grantmaking program. Currently, due to limited evaluation capacity, we are only able to consider researchers nominated by experts with a strong track record. We plan to launch public requests for proposals (RFPs) soon, focused on high-impact research areas. Our grantmaking is funded by a $12 million grant generously provided by Open Philanthropy. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 Grants:

 
 
 Failure Modes in Superhuman Systems 
 Florian Tramer, ETH Zurich 
 
 Broad project examining robustness across four vectors: data poisoning, consistency checks, model stealing, and prompt injection. 
 
 
 Comprehensive Red-Teaming Framework 
 Wenbo Guo, UC Santa Barbara 
 
 Building automated testing systems for LLM alignment against both training-phase threats and testing-phase threats, with a focus on developing agent-based systems that can generate adversarial prompts. 
 
 
 Explaining Superhuman AI Decisions 
 Nicholas Tomlin, UC Berkeley 
 
 Using weak-to-strong generalization to explain superhuman AI systems’ decisions, focusing on domains like chess/Go where superhuman AI already exists. 
 
 
 Securing Alignment 
 Ashwinee Panda, University of Maryland College Park 
 
 Developing methods to make alignment more secure against jailbreaks, prefilling attacks, and finetuning attacks, with approaches spanning the entire model lifecycle. 
 
 
 
 
 
 
 Explaining Superhuman AI Decisions 
 Nicholas Tomlin, UC Berkeley 
 
 
 
 Using weak-to-strong generalization to explain superhuman AI systems’ decisions, focusing on domains like chess/Go where superhuman AI already exists.

 
 
 
 
 
 Comprehensive Red-Teaming Framework 
 Wenbo Guo, UC Santa Barbara 
 
 
 
 Building automated testing systems for LLM alignment against both training-phase threats and testing-phase threats, with a focus on developing agent-based systems that can generate adversarial prompts.

 
 
 
 
 
 Securing Alignment 
 Ashwinee Panda, University of Maryland College Park 
 
 
 
 Developing methods to make alignment more secure against jailbreaks, prefilling attacks, and finetuning attacks, with approaches spanning the entire model lifecycle.

 
 
 
 
 
 Failure Modes in Superhuman Systems 
 Florian Tramer, ETH Zurich 
 
 
 
 Broad project examining robustness across four vectors: data poisoning, consisten

... (truncated, 4 KB total)
Resource ID: f39e450eac7bbaa9 | Stable ID: ZWQwMmU4Nj