Back
FAR.AI - Robustness Research
webfar.ai·far.ai/topic/robustness
Data Status
Not fetched
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| FAR AI | Organization | 76.0 |
Cached Content Preview
HTTP 200Fetched Feb 22, 20269 KB
FAR.AI - Robustness Research
We updated our website and would love your feedback!
Events
Events
Programs
Programs
Blog
About
About
Careers Donate
Robustness
Making advanced AI models robust.
View research
At FAR.AI, we aim to make advanced AI models robust. In simple terms, robustness refers to an AI system’s reliability, especially in unfamiliar or challenging situations. Currently, most AI systems are far from robust. They often fail when exposed to new environments and can easily be exploited by adversaries. These weaknesses will pose increasingly serious risks as AI models become more powerful and embedded in critical areas like infrastructure. The challenge we face is clear: as AI capabilities rapidly advance, we must ensure robustness keeps pace.
A key question is whether more capable systems will naturally become robust.
We found that superhuman Go AIs are highly exploitable, demonstrating that capability advances do not guarantee robustness. Nor are these issues easily addressed: we tested three natural defenses for Go playing AIs , finding our attack can overcome all of them. Combined with prior work from the adversarial robustness literature, we argue in this position piece that robustness is unlikely to be solved under the status-quo AI development paradigm, and highlight a number of safety risks this poses.
Robustness might not be solved under status-quo development – but could scaling capabilities at least help improve robustness? We explored empirical scaling trends for robustness in language models . We find scaling model size and adversarial training both improve robustness, with adversarial training orders of magnitude more compute efficient. However, currently the offense-defense balance favors offense, both in absolute terms (an attacker can break a model with a fraction of the compute used to defend it) and relative terms (a model trained with twice as much adversarial training can be broken with less than twice as much compute).
In the longer-term, we seek to develop an empirical science of robustness, informing both research prioritization and AI governance. We will use methods such as scaling trends to develop novel defense mechanisms capable of scaling to deliver robustness for advanced AI systems.
If scaling trends for defense mechanisms are persistently unfavorable, then we will develop mitigations to contain and prevent harms from non-robust models.
Our key robustness work includes:
Beating Superhuman Go AIs
We demonstrate capabilities do not guarantee robustness, as superhuman Go AIs can still be beaten by simple strategies.
Learn more
Scaling Trends for Robustness
We study the
... (truncated, 9 KB total)Resource ID:
fdfeea220b66daa1 | Stable ID: ZmU3MzRjMG