Singapore Consensus on AI Safety Research Priorities
Singapore Consensus on AI Safety Research Priorities
The Singapore Consensus on Global AI Safety Research Priorities (arXiv:2506.20702) is a consensus document produced by the April 2025 SCAI conference alongside ICLR 2025, authored by 88 researchers from 11 countries. Building on the 2025 International AI Safety Report, it organizes AI safety research into three defence-in-depth areas—Assessment (risk measurement, metrology, third-party audits), Development (specification, training, verification of trustworthy systems), and Control (post-deployment monitoring, ecosystem oversight, societal resilience). It proposes limiting AGI risk by constraining autonomy, generality, or intelligence dimensions, and advocates non-agentic AI monitoring agentic systems. Uniquely identifies areas of mutual interest for cooperation even among competitors.
Singapore Consensus on AI Safety Research Priorities
The Singapore Consensus on Global AI Safety Research Priorities (arXiv:2506.20702) is a consensus document produced by the April 2025 SCAI conference alongside ICLR 2025, authored by 88 researchers from 11 countries. Building on the 2025 International AI Safety Report, it organizes AI safety research into three defence-in-depth areas—Assessment (risk measurement, metrology, third-party audits), Development (specification, training, verification of trustworthy systems), and Control (post-deployment monitoring, ecosystem oversight, societal resilience). It proposes limiting AGI risk by constraining autonomy, generality, or intelligence dimensions, and advocates non-agentic AI monitoring agentic systems. Uniquely identifies areas of mutual interest for cooperation even among competitors.
Quick Assessment
| Dimension | Assessment | Evidence |
|---|---|---|
| Scope | Technical AI safety research priorities | Covers general-purpose AI including LLMs, multimodal models, and AI agents |
| Framework | Defence-in-depth (Assessment, Development, Control) | Three complementary areas that reinforce each other |
| Consensus Level | 88 co-authors from 11 countries | Researchers from academia, industry, and government across geopolitical lines |
| Building On | 2025 International AI Safety Report | Extends findings from the report backed by 33 governments |
| Novel Contribution | Areas of mutual interest for competitor cooperation | Draws aviation safety parallels for shared risk thresholds and benchmarks |
| Binding Force | None—scientific consensus document | Intended to inform policymakers, not create obligations |
| Policy Channel | Presented at ATxSummit ministerial roundtable (May 2025) | Bridges AI research community and digital ministers |
Overview
The Singapore Consensus on Global AI Safety Research Priorities is a consensus document produced by the 2025 Singapore Conference on AI: International Scientific Exchange on AI Safety (SCAI), held on April 26, 2025, alongside ICLR 2025 in Singapore. The document was designed by an Expert Planning Committee consisting of Yoshua Bengio, Tegan Maharaj, Luke Ong, Stuart Russell, Dawn Song, Max Tegmark, Lan Xue, and Ya-Qin Zhang, with 80 additional co-authors contributing feedback through several rounds of revision.1
The Consensus builds on the 2025 International AI Safety Report (chaired by Bengio, backed by 33 governments) and aims to synthesize research priorities with broad agreement among diverse researchers. Its scope is limited to technical safety research for general-purpose AI (GPAI), including language models, multimodal models, and general-purpose agents—systems capable of performing or adapting to perform a wide range of tasks, particularly those that autonomously act and plan to accomplish complex tasks with minimal human oversight.
The document adopts a defence-in-depth model, organizing AI safety research into three complementary areas: challenges with evaluating AI risks (Assessment), challenges with creating trustworthy AI systems (Development), and challenges with monitoring and intervening after deployment (Control). This framework ensures that failures in any single area are caught by the others.
The Three-Area Framework
Area 1: Risk Assessment
The primary goal of risk assessment is to understand the severity and likelihood of potential harms. The Consensus identifies three sub-areas:
A. Developing methods to measure AI system impacts: This includes standardized assessments for risky behaviors through audit techniques and benchmarks, evaluation of new capabilities (including potentially dangerous ones), and real-world societal impact assessment through field tests and prospective risk analysis.
B. Enhanced metrology: Technical methods for quantitative risk assessment tailored to AI systems, aimed at reducing uncertainty and the need for large safety margins.
C. Third-party audit infrastructure: Secure infrastructure enabling thorough evaluation while protecting intellectual property and preventing model theft.
Key research topics include:
- Downstream impact forecasting: Assessing societal impacts (labor, misinformation, privacy) through usage data, trend analysis, and risk modeling, including "uplift studies" on malicious use capabilities
- Loss-of-control risk assessment: Assessing risks where advanced AI systems operate outside human control, potentially actively undermining control measures, including assessing AI agency, oversight evasion, persuasion, and resource acquisition capabilities
The Consensus notes a lack of expert consensus on the likelihood of severe loss-of-control scenarios, underscoring the need for improved methodology.
Area 2: Development of Trustworthy Systems
Following a classic safety engineering framework, this area covers specifying desired behavior, designing systems to meet specifications, and verifying compliance:
A. Specification & validation: Defining how the system should behave and ensuring the definition meets user and societal needs.
B. Training & design: New approaches to training data curation, robustness testing, and model editing to ensure systems resist tampering, jailbreaks, and misuse.
C. Verification: Safety engineering approaches that formally specify, build, and verify safe functionality.
The Consensus introduces two notable concepts:
- Limiting AGI risk dimensions: A framework categorizing AGI as the convergence of autonomy, generality, and intelligence—suggesting risk mitigation by deliberately limiting one or more attributes in system design
- Non-agentic guardrails: Using non-agentic AI systems to monitor agentic systems' proposed actions and intervene by blocking them if needed, including a "Scientist AI" that estimates the probability that a candidate action violates safety specifications
Area 3: Control and Societal Resilience
This area addresses post-deployment challenges through feedback-loop interventions:
- System-level monitoring and intervention: Scalable interpretability and red-teaming methods that reveal failures in real-world environments, especially multi-agent contexts
- Ecosystem-wide monitoring: Continuous oversight extending beyond individual systems to the broader AI ecosystem
- Societal resilience research: Strengthening societal infrastructure (economic, security) against AI-enabled disruption and misuse
- AI control setups: Monitoring and intervention mechanisms that work even if the underlying untrusted AI system actively attempts to subvert safeguards
Areas of Mutual Interest
Unlike previous safety reports, the Singapore Consensus explicitly identifies areas where stakeholders—including rival companies and nations—may benefit from cooperation. The authors draw comparisons to aviation safety, where fierce competitors collaborate to prevent accidents because shared safety standards benefit everyone.
| Area of Mutual Interest | Why Competitors Benefit | Precedent |
|---|---|---|
| Shared risk thresholds | Common understanding of when intervention is needed reduces regulatory uncertainty | Aviation safety standards |
| Evaluation protocols | Standardized testing enables fairer comparison and reduces duplicated effort | Medical device testing |
| Safety benchmarks | Shared baselines raise the floor for all actors and build public trust | Financial stress testing |
The Consensus emphasizes that carefully defined risk thresholds are the first example—actors may find it in their self-interest to share them widely or cooperate on their development, even with competitors.
Context Within the Summit Series
The Singapore Consensus occupies a distinct position within the evolving international AI safety summit series. While the Bletchley (2023), Seoul (2024), and Paris (2025) summits were government-led diplomatic convenings focused on declarations and commitments, SCAI was a researcher-driven conference focused specifically on technical research priorities.
| Dimension | Diplomatic Summits | SCAI Singapore |
|---|---|---|
| Led by | Governments | Researchers |
| Output | Declarations, commitments | Consensus research priorities |
| Binding force | Non-binding political commitments | Scientific consensus document |
| Participation | 28-100+ countries | 88 researchers from 11 countries |
| Focus | Risk recognition, institutional building | Technical safety research agenda |
This researcher-driven approach was particularly significant following the Paris summit, where the US and UK refused to sign the declaration, raising concerns about fracturing political consensus. SCAI demonstrated that scientific consensus on AI safety priorities could be maintained across geopolitical lines even when diplomatic alignment weakened.
Limitations
- No enforcement mechanism: Like the diplomatic summits, the Consensus relies on voluntary adoption by researchers, companies, and governments
- Scope limited to GPAI: Does not address narrow AI systems, autonomous weapons specifically, or non-technical governance challenges
- Consensus by design: Deliberately avoids areas of genuine disagreement (such as the likelihood of loss-of-control scenarios), which may limit usefulness on the most contested questions
- Living document: Continues to evolve, meaning the research priorities may shift as the community revises them
- Implementation gap: Identifying research priorities does not guarantee funding, talent allocation, or institutional support for pursuing them
Sources
Footnotes
-
Bengio, Y., Maharaj, T., Ong, L., Russell, S., Song, D., Tegmark, M., Xue, L., Zhang, Y-Q., et al. (2025). The Singapore Consensus on Global AI Safety Research Priorities. arXiv:2506.20702. ↩