We're Redwood Research, we do applied alignment research, AMA

web

2021·EA Forum·forum.effectivealtruism.org/posts/xDDggeXYgenAGSTyq/we-re...

Author

Buck

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: EA Forum

This EA Forum AMA offers a direct window into Redwood Research's thinking circa 2021-2022, useful for understanding the applied alignment research paradigm and how organizations operationalize safety research priorities.

Metadata

Importance: 55/100primary source

Summary

An Ask Me Anything session with Redwood Research, an applied AI safety organization focused on empirical alignment research. The AMA covers their research approach, ongoing projects, organizational philosophy, and views on key alignment challenges, providing insight into how a prominent safety lab thinks about practical alignment work.

Key Points

•Redwood Research focuses on applied, empirical alignment research rather than purely theoretical approaches, aiming to make near-term safety progress.
•The organization works on projects like adversarial training and robustness, seeking to reduce harmful model outputs through iterative testing.
•The AMA reveals their views on prioritizing tractable safety problems and building a team culture around rigorous empirical methodology.
•Redwood discusses how they think about the relationship between current ML safety work and longer-term existential risk reduction.
•Provides transparency into the organizational structure, hiring philosophy, and research agenda of a prominent applied alignment lab.

1 FactBase fact citing this source

Entity	Property	Value	As Of
Redwood Research	Headcount	10	Oct 2021

Cached Content Preview

HTTP 200Fetched Apr 7, 20264 KB

# We're Redwood Research, we do applied alignment research, AMA
By Buck
Published: 2021-10-05
Redwood Research is a longtermist organization working on AI alignment based in Berkeley, California. We're going to do an AMA this week; we'll answer questions mostly on Wednesday and Thursday this week (6th and 7th of October). I expect to answer a bunch of questions myself; Nate Thomas and Bill Zito and perhaps other people will also be answering questions.

Here's an edited excerpt from [this doc that describes our basic setup, plan, and goals](https://docs.google.com/document/d/12RwJcALg913LM7Jp0PYlaZ48xYqm3tVJggJ-is53zjs/edit#heading=h.5mvxv8g1tvdj).

> Redwood Research is a longtermist research lab focusing on applied AI alignment. We’re led by [Nate Thomas](https://www.linkedin.com/in/nathaniel-t-18603079/) (CEO), [Buck Shlegeris](https://www.linkedin.com/in/buck-shlegeris-a2b89386/) (CTO), and [Bill Zito](https://www.billzito.com/about) (COO/software engineer); our board is Nate, [Paul Christiano](https://paulfchristiano.com/) and [Holden Karnofsky](https://www.openphilanthropy.org/about/team/holden-karnofsky). We currently have ten people on staff.
> 
> Our goal is to grow into a lab that does lots of alignment work that we think is particularly valuable and wouldn’t have happened elsewhere.
> 
> Our current approach to alignment research:
> 
> *   We’re generally focused on [prosaic](https://ai-alignment.com/prosaic-ai-control-b959644d79c2) alignment approaches.
> *   We expect to mostly produce value by doing applied alignment research. I think of applied alignment research as research that takes ideas for how to align systems, such as amplification or transparency, and then tries to figure out how to make them work out in practice. I expect that this kind of practical research will be a big part of making alignment succeed. See [this post](https://www.alignmentforum.org/posts/xRyLxfytmLFZ6qz5s/the-theory-practice-gap) for a bit more about how I think about the distinction between theoretical and applied alignment work.
> *   We are interested in thinking about our research from an explicit perspective of wanting to align superhuman systems.
>     *   When choosing between projects, we’ll be thinking about questions like “to what extent is this class of techniques fundamentally limited? Is this class of techniques likely to be a useful tool to have in our toolkit when we’re trying to align highly capable systems, or is it a dead end?”
>     *   I expect us to be quite interested in doing research of the form “fix alignment problems in current models” because it seems generally healthy to engage with concrete problems, but we’ll want to carefully think through exactly which problems along these lines are worth working on and which techniques we want to improve by solving them.

We're hiring for [research, engineering](https://www.redwoodresearch.org/technical-staff), and an [office operations manager](https://www.redwoodresearch.org/operations

... (truncated, 4 KB total)

Resource ID: aa220baa301c19b0 | Stable ID: sid_3Uchbw9Ve2