Bioweapons
Bioweapons Risk
Comprehensive synthesis of AI-bioweapons evidence through early 2026, including the FRI expert survey finding 5x risk increase from AI capabilities (0.3% → 1.5% annual epidemic probability), Anthropic's ASL-3 activation for Claude Opus 4, and OpenAI's o3 reaching 94th percentile on virology tests. Key developments: DNA screening now catches 97% of threats post-patch, but open-source models (DeepSeek) lack safeguards. Expert consensus: safeguards can reduce risk nearly to baseline even with advanced AI capabilities.
Bioweapons Risk
Comprehensive synthesis of AI-bioweapons evidence through early 2026, including the FRI expert survey finding 5x risk increase from AI capabilities (0.3% → 1.5% annual epidemic probability), Anthropic's ASL-3 activation for Claude Opus 4, and OpenAI's o3 reaching 94th percentile on virology tests. Key developments: DNA screening now catches 97% of threats post-patch, but open-source models (DeepSeek) lack safeguards. Expert consensus: safeguards can reduce risk nearly to baseline even with advanced AI capabilities.
Quick Assessment
| Dimension | Assessment | Evidence |
|---|---|---|
| Current AI Uplift | Low-Moderate (1.3-2.5x) | RAND 2024: no significant difference; Anthropic 2025: "substantially fewer critical failures" with AI |
| Expert Risk Estimate | 0.3% → 1.5% annual with AI capabilities | FRI survey: 5x increase if AI matches expert virologists |
| Frontier Model Status | Expert-level knowledge achieved | OpenAI's o3: 94th percentile on VCT; Claude Opus 4 triggered ASL-3 |
| Screening Evasion | 75%+ pre-patch; 97% post-patch | Microsoft 2024; patch deployed globally Oct 2025 |
| Open-Source Risk | High concern | DeepSeek "worst tested" for biosafety (Amodei 2025) |
| Wet Lab Bottleneck | Remains primary barrier | Soviet Biopreparat: 30,000+ staff over decades; Aum Shinrikyo failed |
| Defense Trajectory | Favored long-term | mRNA platforms, metagenomic surveillance, far-UVC maturing |
| Policy Readiness | Inadequate | CSIS 2025: measures "ill-equipped" for AI threats |
Overview
AI systems could accelerate biological weapons development by helping with pathogen design, synthesis planning, or acquisition of dangerous knowledge. The concern isn't that AI creates entirely new risks, but that it lowers barriers—making capabilities previously requiring rare expertise more accessible to bad actors.
This is considered one of the most severe near-term AI risks because biological weapons can cause mass casualties and AI-assisted bioweapons could be developed by smaller groups than traditional state programs required. Unlike many other AI risks that depend on future, more capable systems, this risk applies to models available today.
The key debate centers on whether AI provides meaningful "uplift"—whether it genuinely helps beyond what's already accessible through scientific literature and internet searches, or whether wet-lab skills remain the true bottleneck. Current evidence is reportedly mixed: a 2024 RAND Corporation study found no statistically significant AI uplift for bioweapon attack planning,[^1] while separate Microsoft research indicated that AI-designed toxins evaded more than 75% of SecureDNA tools.[^2]
However, 2025 has marked a significant shift in official assessments. OpenAI has stated it expects its next-generation models to reach "high-risk classification" for biological capabilities—meaning they could provide "meaningful counterfactual assistance to novice actors."[^3] Anthropic reportedly activated ASL-3 (AI Safety Level 3) protections for Claude Opus 4 specifically due to biological and chemical weapon concerns.[^4] The National Academies of Sciences, Engineering, and Medicine's March 2025 report The Age of AI in the Life Sciences found that while current biological design tools cannot yet design self-replicating pathogens, monitoring and mitigation are urgently needed.[^5] OpenAI's o3 model has also scored at the 94th percentile on the Virology Capabilities Test (VCT), a benchmark designed to measure dangerous biological knowledge.[^6]
Risk Assessment
| Dimension | Assessment | Notes |
|---|---|---|
| Severity | High to Catastrophic | Biological weapons can cause mass casualties; worst-case scenarios involve engineered pandemics |
| Likelihood | Uncertain | Current evidence is mixed on AI uplift; capabilities are rapidly improving |
| Timeline | Near-term | Unlike many AI risks, this concern applies to current systems |
| Trend | Increasing | Each model generation shows more biological knowledge; screening gaps persist |
| Window | Temporary | AI may eventually favor defense (surveillance, vaccines, countermeasures); risk elevated during transition period |
Responses That Address This Risk
| Response | Mechanism | Effectiveness |
|---|---|---|
| Biosecurity Interventions | DNA screening, surveillance, countermeasures, physical defenses | High (portfolio) |
| Responsible Scaling Policies (RSPs) | Internal biosecurity evaluations before deployment | Medium |
| Compute Governance | Limits access to training resources for dangerous models | Medium |
| US AI Chip Export Controls | Restricts AI chip exports to adversary nations | Low-Medium |
| AI Safety Institutes (AISIs) | Government evaluation of biosecurity risks | Medium |
| Voluntary AI Safety Commitments | Lab pledges on dangerous capability evaluation | Low |
The Total Risk Debate
How dangerous is AI-assisted bioweapons development? Expert assessments vary substantially, from those who consider it an imminent catastrophic threat to those who view it as overhyped. Understanding both sides of this debate—and the key uncertainties that drive disagreement—is essential for calibrating policy responses.
Estimating Overall Risk
Attempting to quantify the total risk from AI-assisted bioweapons requires estimating both the probability of an attack and its potential consequences. Estimates vary widely:
| Estimate Type | Range | Source/Basis | Key Assumptions |
|---|---|---|---|
| Annual probability of catastrophic AI-assisted bio attack | 0.01% - 0.5% | Expert elicitation, attack chain analysis | "Catastrophic" = 10,000+ casualties |
| Cumulative probability through 2040 | 0.1% - 8% | Timeline projections | Depends heavily on AI capability trajectory |
| Expected casualties if attack occurs | 10,000 - 10M+ | Historical/scenario analysis | Varies by pathogen, deployment method, response |
| Expected value of harm per year | $1B - $500B | Probability × consequence estimates | Extremely uncertain |
Expert Survey Data (2025)
The Forecasting Research Institute surveyed 46 biosecurity experts and 22 superforecasters in early 2025 on AI-enabled biorisk.[^7] Their findings are summarized below:[^8]
| Scenario | Annual Risk of 100K+ Death Epidemic | Multiplier |
|---|---|---|
| Baseline (no AI capability increase) | 0.3% | 1x |
| AI matches expert virologists on troubleshooting | 1.5% | 5x |
| AI enables 50% of non-experts to synthesize influenza | 1.25% | 4.2x |
| With mandatory screening + jailbreaking safeguards | 0.4% | 1.3x |
Key insight: According to the FRI survey, safeguards (closed weights, anti-jailbreaking, DNA screening) can reduce risk nearly to baseline even with advanced AI capabilities.[^9]
The Bioweapons Attack Chain Model estimates compound attack probability at 0.02%–3.6% depending on assumptions, with substantial uncertainty at each step. The wide range reflects genuine disagreement about key parameters.
Existential risk context: In The Precipice (2020), Oxford philosopher Toby Ord estimates the chance of existential catastrophe from engineered pandemics at 1 in 30 by 2100—which he identifies as second only to AI among anthropogenic risks.[^10] Ord writes that it "now seems within the reach of near-term biological advances to create pandemics that would kill greater than 50% of the population—not just in a particular area, but globally."[^11] While not all engineered pandemics would be AI-assisted, this frames the potential severity of the threat.
Industry concerns: In July 2023, Anthropic CEO Dario Amodei stated that within two to three years, there was a "substantial risk" that AI tools would "greatly widen the range of actors with the technical capability to conduct a large-scale biological attack."[^12] The Center for a New American Security (CNAS) report on AI and bioweapons notes this could "expose the United States to catastrophic threats far exceeding the impact of COVID-19."[^13]
Arguments for High Risk
Those who consider AI-bioweapons a severe threat emphasize several points:
1. Democratization of Dangerous Knowledge
AI makes dangerous biological knowledge more accessible to those who couldn't previously obtain it. While scientific literature contains detailed protocols, navigating it requires expertise. AI systems can synthesize, explain, and contextualize this information for non-experts, potentially expanding the pool of capable actors.
The equalizer effect: The most concerning scenario isn't AI helping expert virologists (who already have the knowledge), but AI helping moderately skilled individuals bridge knowledge gaps that previously required years of training or team collaboration.
2. Asymmetric Evasion Capabilities
According to reporting on Microsoft's 2024 research, AI-designed toxins reportedly evaded a substantial proportion of commercial DNA synthesis screening tools.[^2] This is qualitatively different from knowledge provision—it represents AI helping attackers circumvent existing defenses.
DNA synthesis screening is a cornerstone of current biosecurity. If AI can reliably design functional variants that evade detection, the entire screening paradigm may become obsolete faster than new defenses can be developed. This creates an asymmetric threat where even modest AI capabilities could undermine years of defensive investment.
3. Rapid Capability Improvement
AI capabilities are improving rapidly. Even if current models provide limited uplift, the trend is concerning:
| Capability | GPT-4 (2023) | Claude 3.5/GPT-4o (2024) | Claude Opus 4/o3 (2025) | Trend |
|---|---|---|---|---|
| Biology knowledge | High | Very High | Expert-level | Rapidly increasing |
| Synthesis planning | Moderate | Moderate-High | High | Increasing |
| Evading guardrails | Moderate | Low-Moderate | Low (frontier models) | Variable by model |
| Integration with tools | Limited | Growing | Substantial | Accelerating |
2025 milestone: OpenAI's April 2025 o3 model reportedly ranked in the 94th percentile among expert human virologists on virology capability evaluations, marking the first time a frontier AI model has demonstrated expert-level performance on biological troubleshooting scenarios.[^14]
The argument is that we should prepare for future capabilities, not just current ones. By the time AI demonstrably provides high uplift, it may be too late to establish governance.
4. Combination with Other Technologies
AI alone may provide limited uplift, but the combination of multiple technologies could be transformative:
- LLMs + protein design tools: Tools such as AlphaFold, which DeepMind released publicly in 2021, enable novel protein structure prediction and engineering; LLMs help identify targets and plan experimental applications.[^15]
- AI + lab automation: Automated systems could eventually execute protocols with minimal human intervention
- AI + decreasing synthesis costs: DNA synthesis costs have fallen dramatically over the past two decades; AI could help design sequences optimized for synthesis on cheaper platforms.[^16]
Each technology alone may be manageable, but their combination could create emergent risks that exceed any individual contribution.
5. Tail Risk Considerations
Even if the median expectation is manageable, the worst-case scenarios are severe enough to warrant serious attention:
- Engineered pandemic: A pathogen designed for transmissibility, lethality, and immune evasion could potentially cause millions of deaths
- Multiple simultaneous attacks: AI could enable coordination of attacks across multiple locations
- Degradation of trust in biology: Widespread bioterrorism could undermine beneficial biological research and public health
From a risk management perspective, low-probability/high-consequence events may deserve more weight than their expected value alone suggests.
6. Historical Underestimation
History suggests we systematically underestimate technology-enabled threats. The first nuclear device was tested in July 1945—less than a decade after the discovery of fission in 1938, a pace faster than many contemporary physicists anticipated.[^17] COVID-19 demonstrated how disruptive a novel pathogen can be, causing millions of deaths and trillions of dollars in economic damage within months of its emergence.[^18] AI capabilities have also repeatedly exceeded near-term forecasts.
Skepticism about AI-bioweapons risk may itself be the risky position.
7. The "De-skilling" Trajectory
Multiple emerging technologies are simultaneously reducing the skill requirements for biological research:
- Cloud laboratories automate complex procedures and allow remote execution
- Benchtop DNA synthesizers are approaching gene-length capabilities
- AI assistants bridge knowledge gaps and provide troubleshooting guidance
- Protocol automation reduces the need for tacit laboratory knowledge
Each of these alone might be manageable, but together they suggest a trajectory toward dramatically lowered barriers. Any current empirical study may capture a snapshot where these technologies haven't yet converged—but convergence appears plausible within the decade.
8. Offense Has Asymmetric Advantages
Biological attacks have inherent asymmetric characteristics that favor attackers:
- Attribution lag: Days to weeks may pass before an attack is recognized as intentional
- Preparation asymmetry: Attackers can prepare countermeasures for themselves; defenders must protect everyone
- Innovation asymmetry: Attackers need to succeed once; defenders must anticipate all possible attack vectors
- Psychological impact: Even unsuccessful or small-scale attacks could cause massive economic and social disruption
AI amplifies these asymmetries by potentially enabling novel attack vectors that existing defenses haven't anticipated.
9. Open-Source Model Proliferation
Even if frontier labs implement strong biosecurity measures, the proliferation of open-source models undermines containment:
- No centralized control: Once weights are released, restrictions cannot be enforced
- Fine-tuning vulnerability: Safety training can be removed with relatively modest compute
- Capability improvements: Open models are approaching frontier capabilities with roughly 6–12 month lags
- Global availability: Actors in any jurisdiction can access open models
The [2b6675e423040e53] recommends considering a "licensing regime for biological design tools with potentially catastrophic capabilities"—but this has not been implemented as of 2025.[^19]
The DeepSeek warning: In February 2025, Anthropic CEO Dario Amodei reportedly stated that testing of China's DeepSeek model revealed it was among the worst performers on biosecurity of any model evaluated—generating information relevant to producing bioweapons "that can't be found on Google or can't be easily found in textbooks" with "absolutely no blocks whatsoever."[^20] While Amodei did not characterize DeepSeek as "literally dangerous" at that time, the incident highlighted how open-source models from different jurisdictions may not implement equivalent safety measures.[^21]
Arguments for Lower Risk
Those who consider AI-bioweapons risk overstated emphasize different considerations:
1. The RAND Study: No Significant Uplift
A 2024 RAND Corporation study is among the more rigorous empirical assessments of AI uplift conducted to date. According to reporting on the study, twelve teams of three researchers each spent 80 hours developing bioweapon attack plans—half using AI assistance, half using only open internet resources. Expert evaluators reportedly found no statistically significant difference in plan viability between the two groups.[^1]
This finding directly challenges claims that AI meaningfully assists biological attacks. If AI-assisted and non-AI teams perform equivalently, the AI "threat" may be more limited than feared.
| Group | Information Quality | Plan Viability | Novelty | Statistical Significance |
|---|---|---|---|---|
| AI-assisted | High | Moderate | Low | n/a |
| Internet-only | High | Moderate | Low | n/a |
| Difference | Minimal | Minimal | None | Not significant |
Implications: Dangerous biological information is already widely accessible through legitimate scientific literature. AI may be redundant with existing sources rather than providing novel dangerous capabilities.
2. Wet Lab Bottleneck
Knowledge is not capability. Even with complete theoretical understanding, executing biological synthesis requires:
- Tacit knowledge that transfers poorly through text (how to handle contamination, optimize growth conditions, troubleshoot failures)
- Specialized equipment that is expensive, regulated, and hard to obtain
- Months of practice to develop reliable technique
- Physical safety procedures that untrained individuals typically violate
The Soviet Union's Biopreparat program, established in the 1970s, reportedly employed tens of thousands of scientists and technicians over decades in a state-directed effort to develop reliable bioweapons—a scale of human expertise that underscores the difficulty of the task.[^22] Aum Shinrikyo, despite substantial financial resources and personnel with scientific training, failed repeatedly in their bioweapons attempts throughout the 1990s.[^23] The capability bottleneck may be far more important than the knowledge bottleneck.
AI cannot transfer tacit knowledge. Reading about sterile technique is different from maintaining it reliably under pressure. AI can explain protocols but cannot teach hands-on laboratory skills.
3. Guardrails and Filtering Work
Frontier AI models include safety measures that reduce dangerous information provision:
- Refusals for explicitly harmful requests
- Content filtering
- Constitutional AI and RLHF training
- Continuous red-teaming and patching
While not perfect, these measures raise barriers. Jailbreaking techniques exist but require effort and sophistication, and often produce degraded responses. The marginal attacker may be more likely to use open internet resources than to navigate AI guardrails.
4. Existing Information Abundance
Scientific literature already contains dangerous information. Textbooks explain pathogen biology in detail. The internet hosts synthesis protocols. The marginal information contribution of AI may be minimal when the baseline is that much of this information is already accessible. AI's value proposition—synthesis and accessibility—matters less if motivated individuals were already able to find information through traditional means.
5. Defense Advantages
AI capabilities benefit defense as much as offense, and defensive applications may be more scalable:
| Application | Offense Contribution | Defense Contribution | Net Balance |
|---|---|---|---|
| Pathogen detection | Marginal | Substantial | Defense |
| Vaccine development | Marginal | Transformative | Strong defense |
| Synthesis planning | Moderate | Minimal | Offense |
| Countermeasure design | Marginal | Substantial | Defense |
| Surveillance | None | Substantial | Strong defense |
| Treatment optimization | None | Substantial | Strong defense |
Metagenomic surveillance, mRNA vaccine platforms, and AI-assisted drug discovery are advancing rapidly. These defensive technologies may ultimately make biological attacks less effective rather than more dangerous.
The transition period concern: Even those who believe defense wins long-term often worry about a near-term window where offense temporarily gains advantages before defenses mature.
6. Deterrence and Attribution
Biological attacks, especially sophisticated ones, leave traces that can enable attribution:
- Genomic sequencing of pathogens
- Epidemiological tracking
- Intelligence on precursor purchases
- Surveillance of likely actors
State actors face retaliation risks. Non-state actors face intense investigative focus. The certainty of attribution for significant attacks provides a deterrent effect that pure capability analysis misses.
7. Historical Non-Occurrence
Despite decades of accessible biological knowledge and multiple motivated actors, catastrophic bioterrorism has not occurred. This may indicate genuine difficulty—or it may reflect luck that could change as AI lowers barriers.
The Key Cruxes
Much of the disagreement about AI-bioweapons risk reduces to a small number of factual questions where reasonable people disagree:
Crux 1: Does AI Provide Meaningful Uplift?
If uplift is low (less than 1.5x): Focus resources on traditional biosecurity rather than AI-specific interventions. The threat is real but not qualitatively changed by AI.
If uplift is high (greater than 2x): Urgent need for AI-specific guardrails, compute governance, and model restrictions. The threat landscape has fundamentally shifted.
| Evidence | Favors Low Uplift | Favors High Uplift |
|---|---|---|
| RAND study | Strong | — |
| Screening evasion research | — | Strong |
| Model capability trends | — | Moderate |
| Expert elicitation | Mixed | Mixed |
| Current assessment | Favored (65%) | 35% |
Crux 2: Is the Knowledge Bottleneck or Capability Bottleneck More Important?
If knowledge is the bottleneck: AI providing information is directly dangerous.
If capability is the bottleneck: AI providing information is mostly redundant with existing sources; wet lab skills remain rate-limiting.
| Evidence | Favors Knowledge Bottleneck | Favors Capability Bottleneck |
|---|---|---|
| Historical bioterrorism failures | — | Strong |
| State program difficulty | — | Strong |
| Information abundance online | — | Moderate |
| AI capability trends | Moderate | — |
| Current assessment | 35% | Favored (65%) |
Crux 3: Will Defense or Offense Win Long-Term?
If defense wins: AI-bioweapons is a transitional problem that self-corrects as defensive applications mature.
If offense wins: AI permanently shifts the advantage to attackers, requiring sustained containment efforts.
If it's a window: The near-term favors offense, but defense catches up—the question is whether catastrophic attacks occur during the transition.
| Scenario | Probability | Implications |
|---|---|---|
| Permanent offense advantage | 15% | Maximum concern; sustained containment needed |
| Permanent defense advantage | 40% | Eventually self-correcting; manage transition |
| Temporary window (5-10 years) | 35% | Near-term urgency, medium-term resolution |
| Unclear/context-dependent | 10% | Need robust strategies for multiple scenarios |
Crux 4: How Quickly Are Capabilities Advancing?
If capabilities are saturating: Current systems represent near-peak dangerous capabilities; governance can catch up.
If capabilities continue scaling: Future systems will be substantially more dangerous; governance is racing against time.
The AI-Bioweapons Timeline Model projects capability thresholds, with synthesis assistance potentially arriving 2027-2032 and novel agent design 2030-2040.
Crux 5: How Effective Are Guardrails and Countermeasures?
If guardrails work well: The marginal risk from AI models is small; responsible development practices suffice.
If guardrails fail: Open-source proliferation and jailbreaking make model-level interventions largely ineffective.
| Factor | Favors Guardrails | Favors Guardrail Failure |
|---|---|---|
| Frontier model safety measures | Moderate | — |
| Open-source model proliferation | — | Strong |
| Jailbreaking research | — | Moderate |
| Fine-tuning vulnerability | — | Moderate |
| Current assessment | Partially effective (40%) | Limited effectiveness (60%) |
The open-source challenge: Even if frontier labs implement strong safeguards, open-source models may not. As capable open models proliferate, guardrails become optional, fine-tuning can remove remaining restrictions, and dangerous capabilities become permanently accessible.
Crux 6: Can DNA Synthesis Screening Keep Pace?
DNA synthesis screening is the primary defense against engineered pathogens, but Microsoft's research revealed significant gaps.
If screening adapts: AI-designed evasion is a temporary problem; screening improvements restore the chokepoint.
If screening falls behind: The primary technical barrier erodes; other defenses must compensate.
Key questions:
- Can screening adapt to AI-designed evasive sequences?
- What happens as benchtop synthesis equipment becomes cheaper and more accessible?
- Can screening extend to cover novel synthesis methods and cloud laboratories?
The [14ff22ab7e571166] (April 2024) represents a policy response, but only applies to federally funded programs.
Current Evidence
Studies have shown Large Language Models can provide information relevant to bioweapon development, though the significance is contested.
RAND Red-Team Study (2024)
The RAND Corporation study ("The Operational Risks of AI in Large-Scale Biological Attacks") is reportedly one of the more rigorous empirical assessments of AI uplift conducted to date.1 Researchers Christopher Mouton, Caleb Lucas, and Ella Guest reportedly recruited 15 groups of three people to act as Red Teaming "bad guys."1
According to the study, twelve teams were given 80 hours each over seven weeks to develop bioweapon attack plans based on one of four scenarios—including a "fringe doomsday cult intent on global catastrophe" and a "private military company seeking to aid an adversary's conventional military operation."1 For each scenario, one team had access to an Large Language Models chatbot, another had a different chatbot, and control teams used only internet resources.1
Expert judges (biologists and security specialists) evaluated the resulting plans for biological and operational feasibility. The reported result: no statistically significant difference in plan viability between AI-assisted and non-AI groups.1
Key methodology details:
- Participants had some technical background (science graduates)
- Testing focused on planning, not actual synthesis
- Used 2023-era models; capabilities have advanced since
- Sample size was relatively small (n=12 teams completing the study)
- LLMs did not generate explicit weaponization instructions, but reportedly provided "guidance and context in critical areas such as agent selection, delivery methods, and operational planning"1
Limitations acknowledged by researchers: The study tested planning capability, not execution. It used participants with technical backgrounds, so may underestimate uplift for complete novices. AI capabilities continue advancing.
Implications: The wet-lab bottleneck may be more significant than the knowledge bottleneck. Knowing how to make something is different from being able to make it.
AI-Designed Toxins Evade Screening (2024)
Microsoft researchers reportedly conducted a red-team exercise testing biosecurity in the protein engineering pipeline. According to some sources, DNA screening software—used by synthesis companies to flag dangerous sequences—missed over 75% of AI-designed potential toxins, with one tool flagging only 23% of sequences.2 After the research was published, screening systems reportedly improved to catch approximately 72% on average.2
Key details:
- Tested multiple commercial screening tools
- AI reportedly designed functional variants that differed sufficiently from known threats to evade pattern matching
- Improvement after publication shows screening can adapt—but also shows it wasn't keeping pace
Implications: Even if current LLMs provide limited knowledge uplift, AI protein design tools may create harder-to-detect threats. The screening ecosystem has significant gaps that AI can exploit.
Gryphon Scientific Evaluation (2023)
Anthropic hired Gryphon Scientific to red-team Claude's ability to provide harmful biological information.3 According to reports, the evaluation involved more than 150 hours of testing and drew on more than 20 biosecurity experts.3
The findings were described as concerning. Rocco Casagrande, Gryphon's managing director, reportedly stated he was "personally surprised and dismayed by how capable current LLMs were at providing critical information related to biological weapons."3 He was quoted by Semafor as saying: "These things are developing extremely, extremely fast, they're a lot more capable than I thought they would be when it comes to science."3
Key findings (according to reports):
- One team member with a postdoctoral fellowship studying a pandemic-capable virus found LLMs could provide "post-doc level knowledge to troubleshoot commonly encountered problems" when working with that virus
- For low-skill users, LLMs could suggest which viruses to acquire
- Although LLMs often hallucinate, they answered almost all questions accurately at least sometimes, and answered some critical questions nearly always accurately
- Workshops with biosecurity experts identified concerning misuse scenarios including how to reconstruct information redacted from sensitive scientific documents
Despite the concerning findings, Casagrande reportedly believes "concerted action could ensure safety is built into the most advanced models."3
Anthropic, OpenAI Evaluations
AI labs have conducted extensive internal evaluations testing whether their models could provide "uplift" to potential bioweapon developers.
Anthropic's approach: Anthropic's Responsible Scaling Policies (RSP) defines AI Safety Levels (ASL) modeled after biosafety level (BSL) standards.4 They reportedly conduct at least 10 different biorisk evaluations for each major model release.4 In early 2025, Anthropic reportedly sent a letter to the White House urging immediate action on AI security after its testing revealed alarming improvements in Claude 3.7 Sonnet's ability to assist with aspects of bioweapons development.5
OpenAI's framework: OpenAI's Preparedness Framework categorizes biological and chemical capabilities as "Tracked Categories" requiring ongoing evaluation.6 They define two thresholds:
- High capability: Could "provide meaningful counterfactual assistance to 'novice' actors (anyone with a basic relevant technical background) that enables them to create known biological or chemical threats"6
- Critical capability: Could "introduce unprecedented new pathways to severe harm"6
OpenAI states their most advanced models "aren't yet capable enough to pose severe risks" in biosecurity—but has reportedly indicated upcoming models may reach "high" capability level.6
US/UK AI Safety Institute joint evaluation (2024): The first joint government-led model evaluation tested Claude 3.5 Sonnet across biological capabilities, cyber capabilities, software development, and safeguard efficacy.7 Elizabeth Kelly, AISI director, was quoted as calling it "the most comprehensive government-led safety evaluation of an advanced AI model to date."7
Evaluation Methodology Limitations
An Epoch AI analysis of biorisk evaluations across major AI labs identified significant methodological concerns:[^31]
| Lab | Benchmark Share | Red Teaming | Uplift Trials |
|---|---|---|---|
| Anthropic | ≈40% | Yes | Yes (only lab with text-based trials) |
| OpenAI | ≈50% | Yes | No |
| Google DeepMind | ≈80% | No | No |
Key findings (according to Epoch AI):[^31]
- Most publicly described biorisk benchmarks have "rapidly saturated"—AI systems now exceed expert-human baselines
- Benchmarks "practically always fail to capture many real-world complexities"
- Anthropic is the only frontier lab conducting explicit biorisk uplift trials
- Despite limitations, Epoch AI concluded Anthropic was "largely justified" in activating ASL-3
Kevin Esvelt's Classroom Experiment
MIT researcher Kevin Esvelt reportedly conducted an informal demonstration in which he asked students to use ChatGPT or other LLMs to identify dangerous pathogens.[^32] According to some accounts, after approximately one hour, the class had identified four potential pandemic pathogens, methods to generate them from synthetic DNA, names of DNA synthesis companies unlikely to screen orders, and detailed protocols and troubleshooting guidance.[^32]
Esvelt was quoted regarding AI's ability to circumvent DNA screening defenses: "We've built a Maginot Line of defense, and AI just walked around it."[^32]
This demonstration, while not a rigorous study, illustrates how quickly accessible LLMs can be leveraged for potentially dangerous information-gathering—even for those without prior expertise.
CNAS Report: AI and Biological National Security Risks (2024)
The Council on Strategic Risks report by Bill Drexel and Caleb Withers provides a comprehensive analysis of the evolving AI-biosecurity landscape.[^33]
Key concerns identified:[^33]
- AI could enable bioterrorism, create unprecedented superviruses, and develop novel targeted bioweapons
- AI's potential to "optimize bioweapons for targeted effects, such as pathogens tailored to specific genetic groups or geographies, could significantly shift states' incentives to use biological weapons"
- If realized, such threats could "expose the United States to catastrophic threats far exceeding the impact of COVID-19"
Key recommendations:[^33]
- Strengthen screening mechanisms for cloud labs and genetic synthesis providers
- Conduct rigorous assessments of foundation models' biological capabilities throughout the bioweapons lifecycle
- Invest in technical safety mechanisms to curb threats posed by foundation models
- Consider a licensing regime for biological design tools with potentially catastrophic capabilities
The report emphasizes that while AI-enabled biological catastrophes are "far from inevitable," current biological safeguards already need significant updates.[^33]
2025–2026 Developments: A Pivotal Period
2025 marked a significant shift in how AI labs and governments assess biological risks. According to the Council on Strategic Risks: "The year 2025 brought rising public awareness and discussion of the risks at the AI-biology nexus."[^34]
| Development | Date | Significance |
|---|---|---|
| Evo2 biological AI model released | Feb 2025 | Reportedly trained on 128,000+ genomes |
| FRI expert survey published | Feb 2025 | Surveyed approximately 46 experts and 22 superforecasters on AI-bio risk |
| OpenAI's o3 virology benchmark | Apr 2025 | Reportedly scored at approximately 94th percentile on a virology capabilities test |
| Anthropic ASL-3 activation | May 2025 | First reported use of highest safety tier, for Claude Opus 4 |
| US AI Action Plan biosecurity chapter | Jul 2025 | Federal recognition of AI-enabled pathogen risk |
| UN AI governance bodies formalized | Sep 2025 | Scientific Panel and Global Dialogue established |
| DNA screening patch deployed globally | Oct 2025 | Reportedly achieving approximately 97% detection rate |
| Epoch AI evaluation analysis | 2025 | Found benchmark saturation across labs |
Several specific developments stand out:
OpenAI's High-Risk Classification
OpenAI reportedly announced that upcoming models—particularly successors to the o3 reasoning model—may trigger "high-risk classification" under its Preparedness Framework.[^35] This would mean they could provide "meaningful counterfactual assistance to novice actors" in creating known biological threats.[^35]
Key points from OpenAI's approach (according to reports):[^35]
- Classified ChatGPT Agent as having "High capability in the biological domain"
- Discovered that creating bioweapons would require weeks or months of sustained AI interaction, not single conversations
- Implemented a traffic-light system: red-level content (direct bioweapon assistance) is immediately blocked; yellow-level content (dual-use information) requires careful handling
Anthropic's ASL-3 Activation (May 2025)
Anthropic reportedly became the first lab to activate its highest safety tier (ASL-3) specifically for biological concerns when releasing Claude Opus 4.[^36] Their internal evaluations reportedly found they "could no longer confidently rule out the ability of our most advanced model to uplift people with basic STEM backgrounds" attempting to develop CBRN weapons.[^36]
Anthropic's testing reportedly revealed:[^36]
- Participants with access to Claude Opus 4 developed bioweapon acquisition plans with "substantially fewer critical failures" than internet-only controls
- Claude went from underperforming world-class virologists to "comfortably exceeding that baseline" on virology troubleshooting within a year
National Academies Report (March 2025)
The National Academies of Sciences, Engineering, and Medicine published "The Age of AI in the Life Sciences: Benefits and Biosecurity Considerations," reportedly directed by Executive Order 14110.[^37] Key findings included:[^37]
- AI-enabled biological tools can improve biosecurity through enhanced surveillance and faster countermeasure development
- Current biological design tools can design simpler structures (molecules) but cannot yet design self-replicating pathogens
- A "distinct lack of empirical data" exists for evaluating biosecurity risks of AI-enabled biological tools
- Recommended continued investment alongside monitoring for potential risks
CSIS Policy Analysis (August 2025)
The Center for Strategic and International Studies reportedly published "Opportunities to Strengthen U.S. Biosecurity from AI-Enabled Bioterrorism," warning that current U.S. biosecurity measures are "ill-equipped to meet these challenges."[^38] The report noted that critical safeguards in biological design tools are "already circumventable post-deployment."[^38]
Supplementary Evidence
| Source | Finding | Implications |
|---|---|---|
| National Academies (2025) | BDTs cannot yet design self-replicating pathogens | Current tools limited; monitoring needed |
| CSIS Report (2025) | Current biosecurity measures inadequate | Policy urgently needs updating |
| OpenAI Preparedness (2025) | Next-gen models may hit "high-risk" | Frontier labs anticipate near-term uplift |
| Anthropic ASL-3 (2025) | Cannot rule out CBRN uplift for novices | First reported activation of highest safety tier |
| DeepSeek testing (2025) | Open-source models reportedly lack equivalent safeguards | Proliferation concern raised |
| CNAS Report (2024) | AI-bio integration is emerging risk | Supports compound capability concern |
How AI Could Help Attackers
AI could assist at multiple stages of bioweapon development:
Attack Chain Analysis
A successful biological attack requires success across multiple stages, each with independent failure modes:
| Stage | AI Contribution | Traditional Difficulty | AI Changes What |
|---|---|---|---|
| Motivation | None | Present | — |
| Information access | High | Moderate | Reduces search time |
| Knowledge uplift | Low-Moderate | High | Bridges expertise gaps |
| Lab access | None | High | — |
| Synthesis | None (currently) | Very High | Future: could guide procedures |
| Deployment | Low | High | Could optimize dispersal |
| Evading countermeasures | Moderate | Variable | Could design novel variants |
See Bioweapons Attack Chain Model for detailed probability estimates at each stage.
Specific Assistance Pathways
Target identification — AI might help identify dangerous modifications to known pathogens or find novel biological agents. Large language models trained on scientific literature have extensive knowledge of pathogen biology.
Synthesis planning — AI could help determine how to create dangerous biological materials. Protein design tools can generate novel sequences, and LLMs can explain synthesis routes.
Knowledge bridging — Most concerningly, AI might help bridge knowledge gaps. Historically, bioweapons development required rare combinations of expertise. AI could help a motivated individual or small group compensate for missing knowledge, potentially replacing what previously required teams of specialists.
Evasion optimization — AI could help design pathogens or synthesis routes that evade detection by screening tools, surveillance systems, or medical countermeasures.
History & Current Infrastructure
Biological threats exist on a spectrum. State programs have historically been the main concern, but the barrier to entry may be dropping. The COVID-19 pandemic demonstrated how much damage pathogens can cause and highlighted gaps in biosecurity infrastructure.
Historical Programs
State Bioweapons Programs
Multiple nations have maintained offensive biological weapons programs despite the Biological Weapons Convention (BWC):8
| Program | Era | Scale | Outcome |
|---|---|---|---|
| US | 1943–1969 | Large | Unilaterally terminated by Nixon |
| Soviet Union | 1928–1992 | Massive (reportedly 30,000–40,000 staff) | Collapsed with USSR; concern about residual capabilities and scientist emigration |
| Japan (Unit 731) | 1937–1945 | Large | Defeated in WWII; perpetrators granted immunity by US in exchange for data |
| Iraq | 1980s–1990s | Moderate | Dismantled after Gulf War; revealed extensive program |
| South Africa | 1981–1993 | Moderate | Dismantled post-apartheid; included ethnic targeting research |
These programs required vast resources, thousands of scientists, and state-level infrastructure. The concern is that AI could reduce these requirements.
Current compliance concerns: According to some sources, the 2024 State Department report raised BWC compliance concerns about China, Russia, Iran, and North Korea.9 Verification remains difficult because the BWC has no formal verification regime.8
The Soviet Biopreparat Program: A Case Study
The Soviet Union reportedly operated one of the world's largest biological weapons programs—in direct violation of the BWC it had signed.8 Understanding this program illuminates both the scale of resources historically required and the ongoing legacy concerns.
Scale and organization:
- [2670dc534d9adb0c] was reportedly created in April 1974 as a civilian cover organization10
- Reportedly employed 30,000–40,000 personnel across some 40–50 research facilities, according to accounts by former program insiders10
- Reportedly included five major military-focused research institutes, numerous design facilities, three pilot plants, and five dual-use production plants10
- Annual production capacity for weaponized smallpox was reportedly on the order of 90–100 tons, according to defector accounts11
Agents developed:
- Weaponized smallpox (reportedly continued even after WHO declared global eradication)
- Anthrax (including strains developed as enhanced "battle" variants)
- Plague, Q fever, tularemia, glanders, and Marburg hemorrhagic fever
- Agents reportedly designed for aerosol dispersal via ballistic or cruise missiles10
The Sverdlovsk incident (1979): An accidental release of anthrax spores from a Soviet military facility in Sverdlovsk reportedly killed at least 68 people; the true number remains uncertain because KGB records were reportedly destroyed.12 The Soviet government initially attributed deaths to contaminated meat; Boris Yeltsin publicly acknowledged the military origin in 1992.12
Key defectors who revealed the program:
- Vladimir Pasechnik (defected 1989): Described as a high-level defector to the UK; his reported testimony enabled Western leaders to pressure Gorbachev about the program's scope13
- Ken Alibek (Kanatjan Alibekov, defected 1992): Described as a former first deputy director of Biopreparat; after emigrating he reportedly provided US government with a detailed accounting of the program, including work on tularemia and enhanced anthrax strains11
Legacy concerns:
- Some facilities and scientists were reportedly absorbed into public health institutions after the USSR's dissolution
- US programs attempted to redirect former weapons scientists to peaceful research
- According to contemporaneous reporting, in late 1997 the US expanded efforts after detecting what officials described as intensified attempts by Iran and other states to acquire biological expertise from former Soviet institutes14
Lesson for AI risk: Even with massive state resources, Biopreparat reportedly required decades and thousands of scientists to develop reliable weapons. This suggests the wet-lab barrier is formidable—but also that determined state actors with existing infrastructure could integrate AI assistance more easily than non-state actors starting from scratch.
Non-State Actor Attempts
The historical record of non-state biological attacks reveals consistent technical failures despite significant motivation and resources:
1984 Oregon Salmonella Attack (Rajneeshees)
- Members of the Rajneeshee religious commune deliberately contaminated restaurant salad bars in The Dalles, Oregon with Salmonella typhimurium
- According to CDC records, the attack caused 751 cases of food poisoning and 45 hospitalizations; there were no deaths15
- The attack occurred in 1984 and remains the largest confirmed bioterrorist attack in U.S. history15
- It used a readily available pathogen requiring no sophisticated laboratory technology
- Key insight: Demonstrated that biological attacks don't require advanced technology, but also that impact was limited without sophisticated delivery
Aum Shinrikyo (1990s)
- Japanese cult with reportedly $1 billion in assets, hundreds of members, and PhD-level scientists16
- Attempted anthrax, botulinum toxin, and other biological agents—all efforts reportedly failed to produce casualties16
- An anthrax sprayer reportedly deployed in Tokyo produced no casualties, attributed partly to use of a vaccine strain by mistake16
- The group eventually succeeded with a sarin chemical attack in the Tokyo subway in 1995, killing 13 people and injuring thousands17
- Key insight: Even well-funded, technically sophisticated groups with scientific personnel have failed at biological weapons. The wet-lab barrier is real.
2001 Anthrax Letters (Amerithrax)
- Letters containing anthrax spores killed 5 people and infected 17 others in the United States18
- The FBI concluded the perpetrator was Bruce Ivins, a senior scientist at USAMRIID with decades of anthrax research experience and legitimate institutional access to spores18
- Key insight: An insider threat—not information access—enabled this attack. The perpetrator already possessed world-class expertise; AI would not have been the limiting factor.
Why has catastrophic bioterrorism not occurred?
| Factor | Explanation |
|---|---|
| Technical difficulty | Synthesis, production, and weaponization require tacit knowledge |
| Pathogen handling | Dangerous to the attacker; requires safety infrastructure |
| Delivery challenges | Aerosol dispersion is technically demanding |
| Attribution risk | Genomic analysis increasingly enables source identification |
| Goal mismatch | Most terrorist groups want publicity, not mass extinction |
| Limited access | Dangerous pathogens are controlled; acquisition is difficult |
This historical record could indicate either genuine difficulty (the barriers are high) or luck (we've been fortunate). The precautionary argument is that AI could systematically lower multiple barriers simultaneously, changing the calculus even if each individual barrier remains partially intact.
Current Biosecurity Infrastructure
DNA synthesis companies already screen orders for dangerous sequences, but screening isn't comprehensive:
| Defense Layer | Coverage | Effectiveness | AI Vulnerability |
|---|---|---|---|
| DNA synthesis screening | Major companies | Reportedly 40–70% (pre-2024); improving19 | High (evasion design) |
| BSL facility access control | High containment | High | Low |
| Pathogen inventory tracking | Research labs | Moderate | Low |
| Export controls (equipment) | Dual-use items | Moderate | Low |
| Disease surveillance | Advanced countries | Moderate–High | Moderate |
| Medical countermeasures | Known pathogens | Moderate | Moderate (novel agents) |
DNA Synthesis Screening: The Critical Chokepoint
DNA synthesis screening is considered the key "chokepoint" in the AI-assisted bioweapons pipeline—if dangerous sequences can be intercepted before synthesis, attacks become much harder. However, significant gaps remain:
Current limitations:
- Participation in the International Gene Synthesis Consortium (IGSC) is voluntary—not all companies are members
- Regulations are inconsistent between countries
- Screening relies on matching against databases of known dangerous sequences—novel variants can evade detection
- High false positive rates require expensive human review
- Benchtop DNA synthesizers are emerging that could bypass commercial screening entirely
Post-Microsoft patch status: After research revealed high evasion rates against existing screening tools, a software patch was deployed to synthesis companies. According to reporting on that effort, the fix reportedly now catches approximately 97% of threats—but experts have cautioned that the fix remains incomplete and gaps persist.20
Policy response: In April 2024, the White House OSTP released a [14ff22ab7e571166], requiring federally funded programs to screen customers and orders, keep records, and report suspicious orders.[^52] NIST is partnering with stakeholders to improve screening standards and mitigate AI-specific risks.[^52]
Emerging Defensive Infrastructure
SecureDNA: A [dc743c49d6d32327] providing free, privacy-preserving DNA synthesis screening that already exceeds 2026 regulatory requirements. SecureDNA screens sequences below the 50 base pair length using a "random adversarial threshold" algorithm designed to be more robust against AI-designed evasion.
Nucleic Acid Observatory (NAO): A [66f6f860844300d7] pioneering pathogen-agnostic early warning through deep metagenomic sequencing. Unlike traditional surveillance that looks for known pathogens, NAO aims to detect new and unknown pathogens through wastewater and pooled nasal swab sampling. SecureBio's "Delay, Detect, Defend" strategy: Kevin Esvelt's [81e8568b008e4245] works on multiple defensive layers:
- Delay: Synthesis screening and access controls
- Detect: Early warning systems like the NAO
- Defend: Societal resilience through germicidal UV light, pandemic-proof PPE stockpiles, and rapid countermeasure development
Emerging Technologies of Concern
Several emerging technologies could compound AI-enabled biosecurity risks by removing barriers that currently limit attack feasibility:
Benchtop DNA Synthesizers
A new generation of desktop DNA synthesis devices may enable users to print DNA in their own laboratories, potentially bypassing commercial screening entirely.
Current products:
- [43c481d6a9142c27]: Personal DNA/RNA synthesizer, reportedly measuring 27×33×33 cm, producing oligos in approximately 30–50 minutes with around 2.5 min/base turnaround, according to manufacturer specifications
- [591dcb0209e47ea4]: Enzymatic DNA synthesis (water-based, avoiding harsh chemicals), reportedly supporting 96 parallel oligos up to 120 nucleotides per the company's published materials
- [f412700e54865ebf]: Gene-length DNA synthesis on silicon chips, with the company claiming speeds approximately 10× faster than current technologies
- BioXp (Telesis Bio): Commercial benchtop synthetic biology workstation automating pipetting, mixing, thermal cycling, purification, and storage
Current limitations:
- According to some sources, most benchtop devices are limited to sequences under 120 base pairs—insufficient for most dangerous applications
- Not yet viable alternatives to centralized DNA providers for gene-length sequences
- Quality control and yield often inferior to commercial synthesis Biosecurity implications:
- [281a855768b94705] reportedly notes that "three converging technological trends—enzymatic synthesis, hardware automation, and increased demand from computational tools—are likely to drive rapid advancement in benchtop capabilities over the next decade"
- Manufacturers should implement rigorous sequence screening for each fragment produced
- Governments should provide clear regulations for manufacturers to incorporate screening
- Once capabilities exceed current limits, benchtop devices could become a significant biosecurity gap
Cloud Laboratories
[494902be4f16a999] are heavily automated, centralized research facilities where scientists run experiments remotely from computers. They present unique biosecurity challenges:
How cloud labs lower barriers:
- Reduce technical skill requirements by automating complex procedures
- Enable "one-stop-shop" research that could expand the pool of capable actors
- Allow experiments to be performed remotely, potentially bypassing ethical constraints in traditional academic settings
- Researchers retain full control over experimental design without physical presence
Current governance gaps:
- No public data on cloud lab operations, workflows, customer numbers, or locations worldwide
- No standardized approaches for customer screening shared between organizations
- Cybersecurity laws don't account for unique vulnerabilities of biological data and lab automation systems
- Biosafety regulations typically neglect digital threats like remote manipulation of synthesis machines
Proposed solutions ([5cd1ea7dbc8d0b23]):
- Create a Cloud Lab Security Consortium (CLSC) modeled on the International Gene Synthesis Consortium (IGSC) for DNA synthesis
- Minimum security standards: customer screening, controlled substance access, experiment screening, secured networks
- Human-in-the-loop controls when AI systems place synthesis orders for sequences of concern
Biological Design Tools (BDTs)
Beyond LLMs, specialized biological design tools present distinct risks:
[135f0a4d71fffe67] and protein structure prediction:
- Revolutionary tool for predicting protein structure from genetic sequence; according to some sources, achieving over 90% accuracy on benchmark datasets
- Could enable optimization of existing hazards: increasing toxicity, improving immune evasion, enhancing transmissibility
- Could potentially enable design of completely novel toxins targeting human proteins
- Google DeepMind reportedly engaged more than 50 domain experts in biosecurity assessment during development of AlphaFold 3, according to published accounts
- Implements experimental refusal mechanisms to block misuse—but biological design often resides in dual-use space
Other BDT concerns:
- Machine learning for prediction of host range, transmissibility, and virulence
- Generative models for novel agent design
- Tools that help design sequences evading DNA screening (as demonstrated in published Microsoft research)
Dual-use nature: Unlike LLM guardrails, where harmful requests are often clearly distinguishable, biological design tool queries are frequently dual-use. The same protein optimization that could enhance a therapeutic could theoretically enhance a toxin. This makes technical controls more difficult than for text-based LLMs.
Policy recommendations ([80cf8f51eecba79e]):
- Prerelease evaluation requirements for advanced biological models regardless of funding source
- Prioritize mitigating risks of pathogens capable of causing major epidemics
- Preserve researcher autonomy while implementing targeted controls on highest-risk capabilities
Research Governance & International Law
AI-enabled bioweapons risk exists within a broader context of biosecurity challenges, including ongoing debates about research oversight and international governance gaps.
Gain-of-Function and Enhanced Pandemic Pathogen Research
Gain-of-function (GoF) research—experiments that enhance pathogen transmissibility, virulence, or host range—has become intensely controversial, with implications for AI-biosecurity debates:
Recent policy developments:
- May 2024: The White House Office of Science and Technology Policy released the "Policy for Oversight of Dual Use Research of Concern and Pathogens with Enhanced Pandemic Potential" (DURC/PEPP Policy).[^52]
- May 2025: An executive order reportedly blocked the 2024 policy the day before it was scheduled to take effect.[^53]
- Ongoing: NIH reportedly identified more than 40 projects that may meet definitions of dangerous GoF research and, according to some sources, demanded scientists suspend work.[^54]
Congressional activity:
- The House approved a ban on federal funding for GoF research modifying risky pathogens.
- Scientific groups warn that vaguely worded provisions could unintentionally halt flu vaccine development and other beneficial research.
- The Risky Research Review Act (S. 854, H.R. 1864) would establish a life sciences research security board.
Key limitation: Both the 2014 DURC Policy and the 2024 PEPP Policy apply only to government-funded research. Extending coverage to privately funded research would require new regulations or legislation. AI labs developing biological design tools with private funding currently face no equivalent oversight requirements.
Relevance to AI risk: The GoF debate previews challenges AI governance will face:
- Distinguishing beneficial from dangerous research is difficult.
- Oversight mechanisms are primarily voluntary and apply only to government-funded work.
- International coordination is lacking.
- Technical definitions ("gain of function," "enhanced pandemic potential") are contested.
The Biological Weapons Convention: Structural Weaknesses
The Biological Weapons Convention (BWC), opened for signature in 1972, prohibits the development, production, and stockpiling of biological weapons.[^55] As of the most recent review cycle it has 187 states parties.[^56] Despite its broad membership, the treaty has significant structural weaknesses.
No verification regime:
- Unlike chemical and nuclear weapons agreements, the BWC contains no formal verification provisions.
- Attempts to develop a verification protocol collapsed in 2001 after years of negotiation.[^57]
- According to some analysts, governments effectively ceased substantive discussion of verification within the treaty framework for over two decades following that failure.
Minimal institutional support:
- The BWC Implementation Support Unit has only four staff members.[^58]
- Its budget is, according to some sources, smaller than that of an average McDonald's restaurant—a comparison attributed to philosopher Toby Ord.[^59]
- By contrast, the IAEA employs more than 2,500 staff and the OPCW more than 500 staff.[^60]
Recent developments:
- December 2022: States Parties established a Working Group on Strengthening the Convention.[^61]
- 2024: The fourth and fifth Working Group sessions were held in August and December 2024.
- December 2024: The fifth session reportedly "ended with a regrettable conclusion in which a single States Party undermined the noteworthy progress achieved"—a setback described by the Council on Strategic Risks.[^62]
- The Working Group has reportedly only seven days of scheduled time through the end of 2025 allocated specifically for verification discussion.[^63]
Practical limitations:
- No politically palatable, technologically feasible, and financially sustainable system can guarantee detection of all biological weapons programs.
- Rapid advances in biotechnology create new verification challenges.
- AI capabilities could make verification even more difficult by enabling novel agent design.
What's possible: While perfect verification is unachievable, analysts including those writing in the Bulletin of the Atomic Scientists have argued that "measures in combination could generate considerably greater confidence in compliance by BWC states parties."[^64]
Defensive Technologies and Pandemic Preparedness
The same technological advances that could enable attacks also offer powerful defensive capabilities. Many experts believe defense will ultimately win the offense-defense balance—the question is whether we're in a dangerous transition period.
mRNA Vaccine Platforms
The COVID-19 pandemic demonstrated the transformative potential of mRNA vaccines for rapid response:
Speed advantages:
- Traditional vaccines require time-consuming manufacturing with live pathogens
- mRNA vaccines can be designed in days once a pathogen's genetic sequence is known[^65]
- COVID-19 mRNA vaccines received FDA Emergency Use Authorization in under one year—unprecedented in vaccine history[^66]
- CEPI's "100 Days Mission" aims to develop safe, effective vaccines against novel threats within 100 days of a pandemic being declared[^67]
Manufacturing advantages:
- Cell-free manufacture enables accelerated, scalable production
- Standardizable processes require minimal facility adaptations between products
- Smaller manufacturing footprints than traditional vaccines
- Same facility can produce multiple vaccine products
Safety profile:
- mRNA does not enter the cell nucleus and cannot integrate into the cellular genome[^68]
- Can be administered repeatedly without triggering anti-vector immunity (unlike viral vector vaccines)
- Avoids live pathogen handling in manufacturing
Pandemic preparedness implications:
- Platform is "pathogen-agnostic"—the same technology works against any target with a known sequence
- BARDA and CEPI are reportedly supporting development of dozens of vaccine candidates against high-risk pathogens
- [7f893b5e738ec56f] under development could provide stronger immune responses at lower doses
For AI-bioweapons specifically: Rapid vaccine development could limit the damage from engineered pathogens if detected early. However, novel agents designed to evade detection or existing countermeasures would still pose severe risks during the response window.
Metagenomic Surveillance
Traditional disease surveillance looks for known pathogens. Metagenomic sequencing offers pathogen-agnostic detection:
How it works:
- Deep sequencing of all genetic material in samples (wastewater, nasal swabs, etc.)
- Computational analysis identifies viral, bacterial, and other sequences
- Can detect novel or unexpected pathogens that would not be caught by targeted testing
Current research:
- [66f6f860844300d7]: Sequencing wastewater from major US airports and treatment plants to establish pathogen-agnostic baselines
- One published dataset comprised reportedly 13.1 terabases sequenced from 20 wastewater samples collected at the Los Angeles Hyperion treatment plant, which serves approximately 4 million residents[^69]
- [cd0a1da6bf303e56] established sensitivity models for wastewater metagenomic sequencing (W-MGS) detection
Sensitivity and cost tradeoffs:
- Untargeted shotgun sequencing is less sensitive than targeted methods for known pathogens
- Hybridization capture panels can greatly increase sensitivity for viruses included in the panel, but may reduce sensitivity to entirely unknown pathogens
- Large variation in viral detection exists based on sewershed hydrology and laboratory protocols
- Modeled sensitivity for certain bacterial pathogens has been estimated at roughly 1 infected person detectable among 257–2,250 individuals in a sewershed, according to published sensitivity analyses[^70]
For AI-bioweapons specifically: Metagenomic surveillance could provide early warning for engineered pathogens that evade targeted detection. However, sensitivity limits mean outbreaks may need to reach significant scale before detection occurs.
Far-UVC Germicidal Light
[ae1d3425db815f91] light, operating in the 200–235 nm wavelength range, is emerging as a potentially transformative technology for airborne pathogen inactivation in occupied spaces[^71]:
Why it's different from conventional UV:
- Conventional germicidal UV-C (254 nm) harms human skin and eyes—restricting its use to upper-room applications or unoccupied spaces
- Far-UVC (typically 222 nm) is absorbed in the outer dead layer of human skin and in the tear layer of the eyes, and cannot penetrate to living tissue[^72]
- This property enables direct disinfection of the breathing zone while people are present
Efficacy:
- A very low dose of 2 mJ/cm² of 222-nm light has been reported to inactivate more than 95% of airborne H1N1 influenza virus in laboratory conditions[^73]
- Studies suggest a single far-UVC fixture can deliver the equivalent of 33–66 air changes per hour for pathogen removal[^74]
- Far-UVC has been tested against tuberculosis, SARS-CoV-2, influenza, and murine norovirus, with reported reductions of up to 99.8% for murine norovirus[^75]
- A 2025 review characterized far-UVC as having "high ability" to kill pathogens with a "high level of safety," though the authors noted that long-term human exposure data remain limited[^76]
Applications for pandemic preparedness:
- Installation in hospitals, schools, airports, and public transit could dramatically reduce airborne transmission
- [31dc1e265f5d31a6] is reportedly funding research teams to evaluate deployment in real-world scenarios
- Coefficient Giving has issued an [34e710aed540db3c]
- NIST is reportedly collaborating with industry on standards development
Remaining questions:
- Long-term human exposure effects require further research
- Real-world efficacy in varied building environments is not yet fully characterized
- Cost and feasibility of widespread deployment remain open questions
For AI-bioweapons specifically: Far-UVC could provide a layer of defense against aerosol-dispersed biological agents in public spaces. Even if attackers successfully synthesize and deploy pathogens, widespread far-UVC installation could limit transmission and buy time for medical countermeasure deployment.
Mitigations
Model-Level Interventions
Refusals and filtering — Training models not to help with bioweapon development and filtering dangerous outputs. But these are imperfect—models can be jailbroken, fine-tuned, or open-source models may lack restrictions entirely.
Effectiveness assessment:
- Reduces casual misuse
- Raises barrier for unsophisticated actors
- Does not prevent determined actors with technical skills
- Cannot address open-source model proliferation
Evaluations before deployment — Testing models for biosecurity risks during development, as part of responsible scaling policies. Useful but relies on labs' good faith and competence.
AI-Specific Governance
Compute governance — Limiting who can train powerful models reduces the availability of capable models to bad actors. Information security around model weights becomes important if models can provide meaningful uplift.
Biological capability thresholds — Anthropic's RSP and similar frameworks establish biological capability as a key threshold for enhanced safety measures. This creates systematic evaluation requirements.
Open-source restrictions — Limiting the release of model weights for systems with significant biological knowledge. Controversial due to benefits of open research.
Broader Biosecurity Measures
Broader biosecurity measures may matter more than AI-specific interventions:
| Intervention | Cost | Risk Reduction | Priority |
|---|---|---|---|
| DNA synthesis screening | ≈$100M/year | 5-15% | High |
| Metagenomic surveillance | ≈$500M/year | 15-25% | Very High |
| BSL facility security | ≈$200M/year | 5-10% | High |
| Pandemic response stockpiles | ≈$2B/year | 10-20% | Medium-High |
| International verification | ≈$300M/year | 3-8% | Medium |
DNA synthesis screening — Flagging dangerous sequences before synthesis. The primary defense but has significant gaps that AI can exploit.
Laboratory access controls — Restricting who can work with dangerous pathogens. Effective for legitimate facilities; doesn't address improvised labs.
Disease surveillance — Early detection of outbreaks. Benefits from AI advances and may be where AI provides greatest defensive value.
Medical countermeasures — Rapid vaccine and treatment development. mRNA platforms demonstrated during COVID-19 show how quickly responses can be developed.
Timeline
| Date | Event |
|---|---|
| 1972 | Biological Weapons Convention signed (now 187 states parties) |
| 1984 | Rajneeshee salmonella attack—751 casualties, largest US bioterrorist attack |
| 1995 | Aum Shinrikyo attempts bioweapons (anthrax, botulinum), fails; uses sarin instead |
| 2001 | Anthrax letters kill 5, infect 17; perpetrator was an insider with legitimate access |
| 2020 | Toby Ord publishes Toby Ord, estimating 1/30 existential risk from engineered pandemics |
| 2020-21 | COVID-19 demonstrates pandemic potential; exposes biosecurity gaps |
| 2022 | Collaborations Pharmaceuticals shows AI drug discovery model can generate novel toxic molecules in hours |
| 2023 (July) | Dario Amodei warns of "substantial risk" AI will enable bioattacks within 2-3 years |
| 2023 (Nov) | Gryphon Scientific red-team finds Claude provides "post-doc level" biological knowledge |
| 2024 (Jan) | RAND red-team study finds no significant AI uplift for bioweapon planning |
| 2024 (Apr) | White House OSTP releases Framework for Nucleic Acid Synthesis Screening |
| 2024 (May) | Microsoft research reveals 75%+ of AI-designed toxins evade DNA screening |
| 2024 (Aug) | CNAS publishes report on AI and biological national security risks |
| 2024 (Aug) | US AI Safety Institute signs agreements with Anthropic and OpenAI for biosecurity evaluation |
| 2024 (Oct) | Executive Order 14110 directs National Academies to study AI biosecurity |
| 2024 (Nov) | US/UK AI Safety Institutes conduct first joint model evaluation (Claude 3.5 Sonnet) |
| 2024 (Dec) | Anthropic RSP includes 10+ biological capability evaluations per model |
| 2025 (Jan) | Anthropic sends letter to White House citing "alarming improvements" in Claude 3.7 Sonnet |
| 2025 (Feb) | Anthropic CEO reports DeepSeek was "the worst" model tested for biosecurity safeguards |
| 2025 (Mar) | National Academies publishes "The Age of AI in the Life Sciences" report |
| 2025 (Apr) | OpenAI's o3 model ranks 94th percentile among expert virologists on capability test |
| 2025 (May) | Anthropic activates ASL-3 protections for Claude Opus 4 due to CBRN concerns |
| 2025 (Jun) | OpenAI announces next-gen models will hit "high-risk" biological classification |
| 2025 (Jul) | OpenAI hosts biodefense summit with government researchers and NGOs |
| 2025 (Jul) | Trump administration's AI Action Plan identifies biosecurity as priority |
| 2025 (Aug) | CSIS publishes "Opportunities to Strengthen U.S. Biosecurity from AI-Enabled Bioterrorism" |
| 2025 (Sep) | UN formalizes International Scientific Panel on AI and Global Dialogue on AI Governance |
| 2025 (Oct) | Microsoft publishes Science paper; screening patch deployed globally (97% effective) |
| 2025 (Oct) | Hoover Institution warns biotech+AI is "one of the biggest emerging security threats" |
| 2025 (Dec) | Council on Strategic Risks publishes "2025 AIxBio Wrapped" year-in-review |
| 2026 (Jan) | Epoch AI finds biorisk benchmarks have "rapidly saturated" across frontier labs |
Expert Perspectives
Expert opinion on AI-bioweapons risk is divided, with prominent voices on both sides:
Those More Concerned
Kevin Esvelt (MIT): One of the most vocal experts on AI-biosecurity risks. Esvelt emphasizes that if you ask a chatbot how to cause a pandemic, "it will suggest the 1918 influenza virus. It will even tell you where to find the gene sequences online and where to purchase the genetic components." He co-founded SecureDNA and SecureBio to address these risks.
Dario Amodei (Anthropic CEO): In July 2023, stated there was a "substantial risk" that within 2-3 years, AI would "greatly widen the range of actors with the technical capability to conduct a large-scale biological attack." In February 2025, reported that DeepSeek was "the worst" model tested for biosecurity, generating information "that can't be found on Google or easily found in textbooks."
Johannes Heidecke (OpenAI Head of Safety Systems): In June 2025, announced OpenAI expects upcoming models to hit "high-risk classification" for biological capabilities. Emphasized that "99% or even one in 100,000 performance is [not] sufficient" for testing accuracy.
Rocco Casagrande (Gryphon Scientific): After red-teaming Claude, said he was "personally surprised and dismayed by how capable current LLMs were" and that "these things are developing extremely, extremely fast."
Toby Ord (Oxford): Estimates engineered pandemic risk at 1 in 30 by 2100—second highest anthropogenic existential risk after AI itself.
Georgia Adamson and Gregory C. Allen (CSIS): Their August 2025 report warns current U.S. biosecurity measures are "ill-equipped" to meet AI-enabled challenges, with BDT safeguards "already circumventable post-deployment."
Bill Drexel and Caleb Withers (CNAS): Their August 2024 report warns AI could enable "catastrophic threats far exceeding the impact of COVID-19."
Those More Skeptical
RAND researchers (Mouton, Lucas, Guest): Their 2024 study found "no statistically significant difference" between AI-assisted and non-AI groups in bioweapon planning capability. This is the strongest empirical evidence against immediate AI uplift concerns.
Some biosecurity practitioners: Emphasize that the wet lab bottleneck—tacit knowledge, equipment access, technique—remains the primary barrier, and AI cannot transfer hands-on skills.
Information abundance argument: Dangerous information is already accessible through scientific literature and the internet. AI may provide convenience but not fundamentally new capabilities.
The Disagreement Structure
The debate often reduces to different assessments of:
| Question | Higher Concern View | Lower Concern View |
|---|---|---|
| Current uplift | 2025 lab evaluations show expert-level capabilities | RAND 2024 study is most rigorous empirical evidence |
| Future trajectory | OpenAI/Anthropic expect "high-risk" soon | May plateau; defenses improving |
| Key bottleneck | Knowledge gap narrowing fast | Wet lab skills remain rate-limiting |
| Guardrail effectiveness | DeepSeek shows open-source gaps | Frontier labs implementing robust safeguards |
| Risk tolerance | ASL-3 activation signals real concern | Base rates suggest low probability |
2025 shift: The debate has evolved significantly. Both major frontier labs now officially acknowledge their next-generation models pose elevated biological risks. The question is shifting from "does AI provide uplift?" to "how much uplift, and can mitigations keep pace?"
Notably: Even those who downplay current uplift often acknowledge that future models may pose greater risks, and that defensive investments are worthwhile regardless.
Sources & Resources
2025-2026 Key Sources
| Source | Type | Key Finding |
|---|---|---|
| Forecasting Research Institute | Expert survey | 5x risk increase from AI; safeguards reduce risk to baseline |
| Council on Strategic Risks Year Review | Analysis | Rising awareness of AIxBio risks; UN governance bodies formed |
| Epoch AI Evaluation Analysis | Methodology review | Biorisk benchmarks saturated; Anthropic only lab with uplift trials |
| CSIS Policy Analysis | Policy | US biosecurity measures "ill-equipped" for AI threats |
| Anthropic Biorisk Methodology | Technical | ASL-3 activation justified; "substantially fewer critical failures" |
| OpenAI Biology Preparedness | Technical | Next-gen models expected to hit "high-risk" classification |
Primary Research
- RAND Corporation (2024): [0fe4cfa7ca5f2270] - The most rigorous empirical study of AI uplift to date
- Microsoft Research (2025): AI-designed toxins evade DNA screening - Published in Science, October 2025
- National Academies (2025): The Age of AI in the Life Sciences: Benefits and Biosecurity Considerations - Comprehensive government-commissioned study on AI biosecurity risks
- Gryphon Scientific (2023): Red-team evaluation of Claude's biological capabilities - [c5bed41f6d28d09e]
- UNICRI (2021): [80cf8f51eecba79e] - Early analysis of AlphaFold biosecurity implications
- Council on Strategic Risks (2023): [210d86aeb49f9c18]
Policy and Analysis
- CSIS (2025): Opportunities to Strengthen U.S. Biosecurity from AI-Enabled Bioterrorism by Georgia Adamson and Gregory C. Allen
- CNAS (2024): [2b6675e423040e53] by Bill Drexel and Caleb Withers
- White House OSTP (2024): [14ff22ab7e571166]
- White House OSTP (2024): [2200ae108bcdce25]
- NIST/AISI (2024): [a0bcc81243f8fbee]
- Congressional Research Service: [abbb1f4748d244a1]
Industry Frameworks
- Anthropic: [394ea6d17701b621]
- Anthropic (2025): Biorisk Evaluations - Detailed methodology for Claude Opus 4 safety testing
- OpenAI: [ded0b05862511312]
- OpenAI (2025): Preparing for Future AI Capabilities in Biology - High-risk classification announcement
- OpenAI (2024): [2f918741de446a84]
- Google DeepMind: [81aa1be41165df66]
Biosecurity Organizations
- SecureDNA: [dc743c49d6d32327]
- SecureBio: [81e8568b008e4245]
- Nucleic Acid Observatory: [66f6f860844300d7]
- Nuclear Threat Initiative (NTI): [3069d2a8482e1a3e]
- Blueprint Biosecurity: [ce2d37d76889f2d8]
Emerging Technologies
- NTI (2024): [281a855768b94705]
- RAND (2024): [494902be4f16a999]
- RAND (2024): [5cd1ea7dbc8d0b23]
- EMBO Reports (2024): [d265ec8357439b6b]
International Governance
- Arms Control Association: [0c6a3fa4dd2681d1]
- Arms Control Association (2024): [231fc76d4d46c1f5]
- Bulletin of the Atomic Scientists (2024): [62d7dc2a9efb813b]
- Council on Strategic Risks (2025): [e6e031f2e29221f1]
Defensive Technologies
- Nature (2018): [c0fc46bf88cbfbd2]
- Scientific Reports (2024): [3156632ea73ed418]
- Lancet Microbe (2025): [cd0a1da6bf303e56]
- Virology Journal (2025): [b8e223c44c26338c]
Historical Background
- Wikipedia: [3eb528026caf7aa4]
- Wikipedia: [2670dc534d9adb0c]
- PMC (2023): [2e37589bf4cafca7]
- Toby Ord: The Precipice: Existential Risk and the Future of Humanity (2020)
General Context
- 80,000 Hours: [429979d863628482]
- Bulletin of the Atomic Scientists (2024): [a3cecbd6bf0ee45b]
- Undark (2024): [f132e9a4c94af7d3]
- Science (2025): [59a228de7be0825d]
Video & Audio
- 80,000 Hours Podcast: [2656524aca2f08c0] - MIT researcher on biological risks and pandemic preparedness
- Lex Fridman #431: [385f4249434fefc1] - Discusses AI safety including CBRN risks
- Future of Life Institute: [e78dd5bd5439cb1e] - Multiple episodes on biosecurity
- RAND: [73c1b835c41bcbdb] - Video briefing on the 2024 study
Analytical Models
Footnotes
-
Citation rc-a668 (data unavailable — rebuild with wiki-server access) ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
Citation rc-bcc1 (data unavailable — rebuild with wiki-server access) ↩ ↩2
-
Citation rc-6227 (data unavailable — rebuild with wiki-server access) ↩ ↩2 ↩3 ↩4 ↩5
-
Citation rc-4f12 (data unavailable — rebuild with wiki-server access) ↩ ↩2
-
Citation rc-3b92 (data unavailable — rebuild with wiki-server access) ↩
-
Citation rc-4747 (data unavailable — rebuild with wiki-server access) ↩ ↩2 ↩3 ↩4
-
Citation rc-6e22 (data unavailable — rebuild with wiki-server access) ↩ ↩2
-
Citation rc-f03a (data unavailable — rebuild with wiki-server access) ↩ ↩2 ↩3
-
Citation rc-d624 (data unavailable — rebuild with wiki-server access) ↩
-
Citation rc-5762 (data unavailable — rebuild with wiki-server access) ↩ ↩2 ↩3 ↩4
-
Citation rc-e6c0 (data unavailable — rebuild with wiki-server access) ↩ ↩2
-
Citation rc-628e (data unavailable — rebuild with wiki-server access) ↩ ↩2
-
Citation rc-b6fe (data unavailable — rebuild with wiki-server access) ↩
-
Citation rc-aeaa (data unavailable — rebuild with wiki-server access) ↩
-
Citation rc-ee20 (data unavailable — rebuild with wiki-server access) ↩ ↩2
-
Citation rc-f121 (data unavailable — rebuild with wiki-server access) ↩ ↩2 ↩3
-
Citation rc-ae76 (data unavailable — rebuild with wiki-server access) ↩
-
Citation rc-d8fe (data unavailable — rebuild with wiki-server access) ↩ ↩2
-
Citation rc-ef26 (data unavailable — rebuild with wiki-server access) ↩
-
Citation rc-ddd8 (data unavailable — rebuild with wiki-server access) ↩