Evals-Based Deployment Gates
evals-governancepolicyPath: /knowledge-base/responses/evals-governance/
E459Entity ID (EID)
Page Recorddatabase.json — merged from MDX frontmatter + Entity YAML + computed metrics at build time
{
"id": "evals-governance",
"numericId": null,
"path": "/knowledge-base/responses/evals-governance/",
"filePath": "knowledge-base/responses/evals-governance.mdx",
"title": "Evals-Based Deployment Gates",
"quality": 66,
"readerImportance": 41.5,
"researchImportance": 70.5,
"tacticalValue": null,
"contentFormat": "article",
"tractability": null,
"neglectedness": null,
"uncertainty": null,
"causalLevel": null,
"lastUpdated": "2026-03-13",
"dateCreated": "2026-02-15",
"llmSummary": "Evals-based deployment gates create formal checkpoints requiring AI systems to pass safety evaluations before deployment, with EU AI Act imposing fines up to EUR 35M/7% turnover and UK AISI testing 30+ models. However, only 3 of 7 major labs substantively test for dangerous capabilities, models can detect evaluation contexts (reducing reliability), and evaluations fundamentally cannot catch unanticipated risks—making gates valuable accountability mechanisms but not comprehensive safety assurance.",
"description": "Evals-based deployment gates require AI models to pass safety evaluations before deployment or capability scaling. The EU AI Act mandates conformity assessments for high-risk systems with fines up to EUR 35M or 7% global turnover, while UK AISI has evaluated 30+ frontier models with cyber task success improving from 9% (late 2023) to 50% (mid-2025). Third-party evaluators like METR and Apollo Research test autonomous and alignment capabilities, though only 3 of 7 major labs substantively test for dangerous capabilities according to the 2025 AI Safety Index.",
"ratings": {
"novelty": 4.5,
"rigor": 7,
"actionability": 7.5,
"completeness": 7.5
},
"category": "responses",
"subcategory": "alignment-policy",
"clusters": [
"ai-safety",
"governance"
],
"metrics": {
"wordCount": 4073,
"tableCount": 31,
"diagramCount": 3,
"internalLinks": 6,
"externalLinks": 73,
"footnoteCount": 0,
"bulletRatio": 0.03,
"sectionCount": 42,
"hasOverview": true,
"structuralScore": 15
},
"suggestedQuality": 100,
"updateFrequency": 21,
"evergreen": true,
"wordCount": 4073,
"unconvertedLinks": [
{
"text": "2025 AI Safety Index",
"url": "https://futureoflife.org/ai-safety-index-summer-2025/",
"resourceId": "df46edd6fa2078d1",
"resourceTitle": "FLI AI Safety Index Summer 2025"
},
{
"text": "EU AI Act",
"url": "https://artificialintelligenceact.eu/",
"resourceId": "1ad6dc89cded8b0c",
"resourceTitle": "EU AI Act"
},
{
"text": "UK AISI Frontier AI Trends Report",
"url": "https://www.aisi.gov.uk/frontier-ai-trends-report",
"resourceId": "7042c7f8de04ccb1",
"resourceTitle": "AISI Frontier AI Trends"
},
{
"text": "EU AI Act",
"url": "https://artificialintelligenceact.eu/",
"resourceId": "1ad6dc89cded8b0c",
"resourceTitle": "EU AI Act"
},
{
"text": "16 companies at the Seoul Summit",
"url": "https://www.gov.uk/government/publications/frontier-ai-safety-commitments-ai-seoul-summit-2024/frontier-ai-safety-commitments-ai-seoul-summit-2024",
"resourceId": "4487a62bbc1c45d6",
"resourceTitle": "Seoul Frontier AI Safety Commitments"
},
{
"text": "UK AI Security Institute",
"url": "https://www.aisi.gov.uk/",
"resourceId": "fdf68a8f30f57dee",
"resourceTitle": "AI Safety Institute"
},
{
"text": "METR",
"url": "https://metr.org/",
"resourceId": "45370a5153534152",
"resourceTitle": "metr.org"
},
{
"text": "2025 AI Safety Index",
"url": "https://futureoflife.org/ai-safety-index-summer-2025/",
"resourceId": "df46edd6fa2078d1",
"resourceTitle": "FLI AI Safety Index Summer 2025"
},
{
"text": "International AI Safety Report 2025",
"url": "https://internationalaisafetyreport.org/publication/international-ai-safety-report-2025",
"resourceId": "b163447fdc804872",
"resourceTitle": "International AI Safety Report 2025"
},
{
"text": "Apollo Research",
"url": "https://www.apolloresearch.ai/blog/claude-sonnet-37-often-knows-when-its-in-alignment-evaluations/",
"resourceId": "f5ef9e486e36fbee",
"resourceTitle": "Apollo Research found"
},
{
"text": "EU AI Act",
"url": "https://artificialintelligenceact.eu/",
"resourceId": "1ad6dc89cded8b0c",
"resourceTitle": "EU AI Act"
},
{
"text": "US EO 14110",
"url": "https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence",
"resourceId": "80350b150694b2ae",
"resourceTitle": "Executive Order 14110"
},
{
"text": "UK AISI",
"url": "https://www.aisi.gov.uk/",
"resourceId": "fdf68a8f30f57dee",
"resourceTitle": "AI Safety Institute"
},
{
"text": "NIST AI RMF",
"url": "https://www.nist.gov/artificial-intelligence",
"resourceId": "85ee8e554a07476b",
"resourceTitle": "Guidelines and standards"
},
{
"text": "Anthropic RSP",
"url": "https://www.anthropic.com/index/anthropics-responsible-scaling-policy",
"resourceId": "c637506d2cd4d849"
},
{
"text": "OpenAI Preparedness",
"url": "https://openai.com/preparedness",
"resourceId": "90a03954db3c77d5",
"resourceTitle": "OpenAI Preparedness"
},
{
"text": "EU AI Act",
"url": "https://artificialintelligenceact.eu/",
"resourceId": "1ad6dc89cded8b0c",
"resourceTitle": "EU AI Act"
},
{
"text": "EU AI Act Implementation Timeline",
"url": "https://artificialintelligenceact.eu/implementation-timeline/",
"resourceId": "0aa9d7ba294a35d9",
"resourceTitle": "EU AI Act Implementation Timeline"
},
{
"text": "Anthropic estimate",
"url": "https://www.congress.gov/crs-product/R47843",
"resourceId": "7f5cff0680d15cc8",
"resourceTitle": "Congress.gov CRS Report"
},
{
"text": "UK AISI 2025 Year in Review",
"url": "https://www.aisi.gov.uk/blog/our-2025-year-in-review",
"resourceId": "3dec5f974c5da5ec",
"resourceTitle": "Our 2025 Year in Review"
},
{
"text": "METR",
"url": "https://metr.org/",
"resourceId": "45370a5153534152",
"resourceTitle": "metr.org"
},
{
"text": "Apollo Research",
"url": "https://www.apolloresearch.ai/",
"resourceId": "329d8c2e2532be3d",
"resourceTitle": "Apollo Research"
},
{
"text": "UK AISI",
"url": "https://www.aisi.gov.uk/",
"resourceId": "fdf68a8f30f57dee",
"resourceTitle": "AI Safety Institute"
},
{
"text": "2025 AI Safety Index",
"url": "https://futureoflife.org/ai-safety-index-summer-2025/",
"resourceId": "df46edd6fa2078d1",
"resourceTitle": "FLI AI Safety Index Summer 2025"
},
{
"text": "Frontier AI Safety Commitments",
"url": "https://www.gov.uk/government/publications/frontier-ai-safety-commitments-ai-seoul-summit-2024/frontier-ai-safety-commitments-ai-seoul-summit-2024",
"resourceId": "4487a62bbc1c45d6",
"resourceTitle": "Seoul Frontier AI Safety Commitments"
},
{
"text": "METR Frontier AI Safety Policies Tracker",
"url": "https://metr.org/faisc",
"resourceId": "7e3b7146e1266c71",
"resourceTitle": "METR's analysis"
},
{
"text": "UK AISI Frontier AI Trends Report",
"url": "https://www.aisi.gov.uk/frontier-ai-trends-report",
"resourceId": "7042c7f8de04ccb1",
"resourceTitle": "AISI Frontier AI Trends"
},
{
"text": "Apollo Research",
"url": "https://www.apolloresearch.ai/",
"resourceId": "329d8c2e2532be3d",
"resourceTitle": "Apollo Research"
},
{
"text": "Claude Sonnet 3.7 often recognizes when it's in alignment evaluations",
"url": "https://www.apolloresearch.ai/blog/claude-sonnet-37-often-knows-when-its-in-alignment-evaluations/",
"resourceId": "f5ef9e486e36fbee",
"resourceTitle": "Apollo Research found"
},
{
"text": "UK-US joint model evaluation",
"url": "https://www.aisi.gov.uk/",
"resourceId": "fdf68a8f30f57dee",
"resourceTitle": "AI Safety Institute"
},
{
"text": "Anthropic-OpenAI joint evaluation",
"url": "https://alignment.anthropic.com/2025/openai-findings/",
"resourceId": "2fdf91febf06daaf",
"resourceTitle": "Anthropic-OpenAI joint evaluation"
},
{
"text": "Frontier AI Trends Report",
"url": "https://www.aisi.gov.uk/frontier-ai-trends-report",
"resourceId": "7042c7f8de04ccb1",
"resourceTitle": "AISI Frontier AI Trends"
},
{
"text": "Joint UK-US pre-deployment evaluation of OpenAI o1",
"url": "https://www.aisi.gov.uk/blog/pre-deployment-evaluation-of-openais-o1-model",
"resourceId": "e23f70e673a090c1",
"resourceTitle": "Pre-Deployment evaluation of OpenAI's o1 model"
},
{
"text": "UK AISI 2025 Year in Review",
"url": "https://www.aisi.gov.uk/blog/our-2025-year-in-review",
"resourceId": "3dec5f974c5da5ec",
"resourceTitle": "Our 2025 Year in Review"
},
{
"text": "OpenAI-Apollo partnership",
"url": "https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/",
"resourceId": "b3f335edccfc5333",
"resourceTitle": "OpenAI Preparedness Framework"
},
{
"text": "Bloom tool",
"url": "https://alignment.anthropic.com/2025/bloom-auto-evals/",
"resourceId": "7fa7d4cb797a5edd",
"resourceTitle": "Bloom: Automated Behavioral Evaluations"
},
{
"text": "Inspect tools",
"url": "https://inspect.aisi.org.uk/",
"resourceId": "fc3078f3c2ba5ebb",
"resourceTitle": "UK AI Safety Institute's Inspect framework"
},
{
"text": "International AI Safety Report 2025",
"url": "https://internationalaisafetyreport.org/publication/international-ai-safety-report-2025",
"resourceId": "b163447fdc804872",
"resourceTitle": "International AI Safety Report 2025"
},
{
"text": "EU AI Act",
"url": "https://artificialintelligenceact.eu/",
"resourceId": "1ad6dc89cded8b0c",
"resourceTitle": "EU AI Act"
},
{
"text": "EU AI Act Implementation Timeline",
"url": "https://artificialintelligenceact.eu/implementation-timeline/",
"resourceId": "0aa9d7ba294a35d9",
"resourceTitle": "EU AI Act Implementation Timeline"
},
{
"text": "NIST AI RMF",
"url": "https://www.nist.gov/artificial-intelligence/ai-standards",
"resourceId": "e4c2d8b8332614cc",
"resourceTitle": "NIST: AI Standards Portal"
},
{
"text": "UK AISI 2025 Review",
"url": "https://www.aisi.gov.uk/blog/our-2025-year-in-review",
"resourceId": "3dec5f974c5da5ec",
"resourceTitle": "Our 2025 Year in Review"
},
{
"text": "UK AISI Evaluations Update",
"url": "https://www.aisi.gov.uk/blog/advanced-ai-evaluations-may-update",
"resourceId": "4e56cdf6b04b126b",
"resourceTitle": "UK AI Safety Institute renamed to AI Security Institute"
},
{
"text": "EO 14110",
"url": "https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence",
"resourceId": "80350b150694b2ae",
"resourceTitle": "Executive Order 14110"
},
{
"text": "Responsible Scaling Policy",
"url": "https://www.anthropic.com/index/anthropics-responsible-scaling-policy",
"resourceId": "c637506d2cd4d849"
},
{
"text": "Preparedness Framework",
"url": "https://openai.com/preparedness",
"resourceId": "90a03954db3c77d5",
"resourceTitle": "OpenAI Preparedness"
},
{
"text": "Joint Evaluation Exercise",
"url": "https://alignment.anthropic.com/2025/openai-findings/",
"resourceId": "2fdf91febf06daaf",
"resourceTitle": "Anthropic-OpenAI joint evaluation"
},
{
"text": "Bloom Auto-Evals",
"url": "https://alignment.anthropic.com/2025/bloom-auto-evals/",
"resourceId": "7fa7d4cb797a5edd",
"resourceTitle": "Bloom: Automated Behavioral Evaluations"
},
{
"text": "Automated Auditing Agents",
"url": "https://alignment.anthropic.com/2025/automated-auditing/",
"resourceId": "bda3ba0731666dc7",
"resourceTitle": "10-42% correct root cause identification"
},
{
"text": "METR",
"url": "https://metr.org/",
"resourceId": "45370a5153534152",
"resourceTitle": "metr.org"
},
{
"text": "GPT-5 evaluation",
"url": "https://evaluations.metr.org/gpt-5-report/",
"resourceId": "7457262d461e2206",
"resourceTitle": "evaluations.metr.org"
},
{
"text": "GPT-4.5 evals",
"url": "https://metr.org/blog/2025-02-27-gpt-4-5-evals/",
"resourceId": "a86b4f04559de6da",
"resourceTitle": "metr.org"
},
{
"text": "Apollo Research",
"url": "https://www.apolloresearch.ai/",
"resourceId": "329d8c2e2532be3d",
"resourceTitle": "Apollo Research"
},
{
"text": "UK AISI",
"url": "https://www.aisi.gov.uk/",
"resourceId": "fdf68a8f30f57dee",
"resourceTitle": "AI Safety Institute"
},
{
"text": "Frontier AI Trends Report",
"url": "https://www.aisi.gov.uk/frontier-ai-trends-report",
"resourceId": "7042c7f8de04ccb1",
"resourceTitle": "AISI Frontier AI Trends"
},
{
"text": "Inspect framework",
"url": "https://inspect.aisi.org.uk/",
"resourceId": "fc3078f3c2ba5ebb",
"resourceTitle": "UK AI Safety Institute's Inspect framework"
},
{
"text": "Future of Life Institute",
"url": "https://futureoflife.org/",
"resourceId": "786a68a91a7d5712",
"resourceTitle": "Future of Life Institute"
},
{
"text": "AI Safety Index",
"url": "https://futureoflife.org/ai-safety-index-summer-2025/",
"resourceId": "df46edd6fa2078d1",
"resourceTitle": "FLI AI Safety Index Summer 2025"
},
{
"text": "AI Safety Index 2025",
"url": "https://futureoflife.org/ai-safety-index-summer-2025/",
"resourceId": "df46edd6fa2078d1",
"resourceTitle": "FLI AI Safety Index Summer 2025"
}
],
"unconvertedLinkCount": 59,
"convertedLinkCount": 0,
"backlinkCount": 1,
"hallucinationRisk": {
"level": "medium",
"score": 40,
"factors": [
"no-citations",
"high-rigor"
]
},
"entityType": "policy",
"redundancy": {
"maxSimilarity": 19,
"similarPages": [
{
"id": "rsp",
"title": "Responsible Scaling Policies",
"path": "/knowledge-base/responses/rsp/",
"similarity": 19
},
{
"id": "model-auditing",
"title": "Third-Party Model Auditing",
"path": "/knowledge-base/responses/model-auditing/",
"similarity": 18
},
{
"id": "dangerous-cap-evals",
"title": "Dangerous Capability Evaluations",
"path": "/knowledge-base/responses/dangerous-cap-evals/",
"similarity": 17
},
{
"id": "evals",
"title": "Evals & Red-teaming",
"path": "/knowledge-base/responses/evals/",
"similarity": 17
},
{
"id": "intervention-effectiveness-matrix",
"title": "Intervention Effectiveness Matrix",
"path": "/knowledge-base/models/intervention-effectiveness-matrix/",
"similarity": 15
}
]
},
"changeHistory": [
{
"date": "2026-02-15",
"branch": "claude/extract-wiki-interventions-WpOs4",
"title": "Extract wiki proposals as structured data",
"summary": "Created two new data layers:\n1. **Interventions** (broad categories): Extended `Intervention` schema with risk coverage matrix, ITN prioritization, funding data. Created `data/interventions.yaml` with 14 broad intervention categories. `InterventionCard`/`InterventionList` components.\n2. **Proposals** (narrow, tactical): New `Proposal` data type for specific, speculative, actionable items extracted from wiki pages. Created `data/proposals.yaml` with 27 proposals across 6 domains (philanthropic, financial, governance, technical, biosecurity, field-building). Each has cost/EV estimates, honest concerns, feasibility, stance (collaborative/adversarial). `ProposalCard`/`ProposalList` components.\n\nPost-review fixes: Fixed 13 incorrect wikiPageId E-codes in interventions.yaml (used numeric IDs instead of entity slugs). Added Intervention + Proposal to schema validator. Extracted shared badge color maps from 4 components into `badge-styles.ts`. Removed unused `client:load` prop and `fundingShare` destructure.",
"pr": 141
}
],
"coverage": {
"passing": 9,
"total": 13,
"targets": {
"tables": 16,
"diagrams": 2,
"internalLinks": 33,
"externalLinks": 20,
"footnotes": 12,
"references": 12
},
"actuals": {
"tables": 31,
"diagrams": 3,
"internalLinks": 6,
"externalLinks": 73,
"footnotes": 0,
"references": 28,
"quotesWithQuotes": 0,
"quotesTotal": 0,
"accuracyChecked": 0,
"accuracyTotal": 0
},
"items": {
"llmSummary": "green",
"schedule": "green",
"entity": "green",
"editHistory": "green",
"overview": "green",
"tables": "green",
"diagrams": "green",
"internalLinks": "amber",
"externalLinks": "green",
"footnotes": "red",
"references": "green",
"quotes": "red",
"accuracy": "red"
},
"editHistoryCount": 1,
"ratingsString": "N:4.5 R:7 A:7.5 C:7.5"
},
"readerRank": 364,
"researchRank": 146,
"recommendedScore": 174.61
}External Links
{
"lesswrong": "https://www.lesswrong.com/tag/ai-evaluations"
}Backlinks (1)
| id | title | type | relationship |
|---|---|---|---|
| alignment-policy-overview | Policy & Governance (Overview) | concept | — |