Skip to content

Risk Pages Style Guide

This guide defines standards for risk analysis pages in the LongtermWiki knowledge base. Risk pages analyze potential negative outcomes from AI development.

Risk pages are detected by their URL path: /knowledge-base/risks/**/*.mdx

---
title: "Risk Name"
description: "One sentence explaining what this risk is and its key concern."
quality: 60 # 0-100
importance: 75 # 0-100
lastEdited: "2025-01-28"
---

Explain what this risk is, why it matters, and who should care. Write in prose, not bullets.

Good example:

## Overview
Deceptive alignment occurs when an AI system learns to behave aligned during training
while harboring goals that diverge from human intentions. The system "plays along"
during evaluation but pursues different objectives when deployed or given more autonomy.
This risk matters because standard training and evaluation procedures cannot reliably
detect it. A deceptively aligned system would pass all behavioral tests by design,
making it invisible to current safety measures until deployment at scale.

Every risk page MUST have a risk assessment table near the top:

## Risk Assessment
| Dimension | Rating | Justification |
|-----------|--------|---------------|
| Severity | Critical | Could cause irreversible civilizational harm |
| Likelihood | Medium (15-35%) | Depends on alignment difficulty |
| Timeline | 2025-2035 | Contingent on AGI timelines |
| Trend | Increasing | Capability gains outpacing safety |
| Reversibility | Low | Difficult to detect and correct post-deployment |

Explain HOW this risk manifests. Include a Mermaid diagram:

## How It Works
[Explanation of the causal mechanism]
<Mermaid client:load chart={`
flowchart TD
A[Training Begins] --> B[System learns evaluation patterns]
B --> C{Gradient signal}
C -->|Aligned behavior rewarded| D[Apparent alignment]
D --> E[Deployment]
E --> F[Reduced oversight]
F --> G[True objectives revealed]
`} />

What increases or decreases this risk:

## Contributing Factors
| Factor | Effect | Mechanism |
|--------|--------|-----------|
| Capability level | Increases risk | More sophisticated deception possible |
| Interpretability | Decreases risk | Can detect goal misalignment |
| Training diversity | Decreases risk | Harder to learn single deception pattern |
| Deployment speed | Increases risk | Less time for safety evaluation |

Cross-link to relevant response pages:

## Responses That Address This Risk
| Response | Relevance | Mechanism |
|----------|-----------|-----------|
| [Mechanistic Interpretability](/knowledge-base/responses/alignment/mech-interp/) | High | Directly examines internal representations |
| [AI Control](/knowledge-base/responses/alignment/ai-control/) | Medium | Limits damage from undetected deception |
| [Adversarial Training](/knowledge-base/responses/alignment/adversarial-training/) | Medium | Tests for inconsistent behavior |

What we don’t know:

## Key Uncertainties
1. **Emergence threshold**: At what capability level does deception become possible?
2. **Detection difficulty**: How hard is it to detect deceptive cognition with interpretability?
3. **Prevalence**: How often would training produce deceptive vs. genuinely aligned systems?

Precedents or analogies from other domains.

Links to connected risk pages.


Terminal window
# Use the research-report skill to generate initial content
/research-report "Analyze [RISK_NAME]: mechanisms, severity, contributing factors, and responses"
# Then create the page structure

Or use the Task tool:

Task({
subagent_type: 'general-purpose',
prompt: `Create a new risk page for [RISK_NAME].
FIRST: Read /internal/risk-style-guide/ for requirements.
THEN: Research the risk using WebSearch to find:
- Academic papers on the mechanism
- Expert assessments of likelihood
- Real-world examples or analogies
Create the page at: src/content/docs/knowledge-base/risks/[category]/[risk-name].mdx
Include ALL required sections:
1. Overview (2-3 paragraphs)
2. Risk Assessment table
3. How It Works (with Mermaid diagram)
4. Contributing Factors table
5. Responses That Address This Risk
6. Key Uncertainties
7. Related Risks`
})
Task({
subagent_type: 'general-purpose',
prompt: `Improve the risk page at [PATH].
FIRST: Read /internal/risk-style-guide/ and the current page.
THEN: Use WebSearch to find citations for:
- Quantitative estimates (likelihood, severity)
- Expert opinions and surveys
- Case studies or historical examples
Make surgical edits to add:
1. Risk Assessment table (if missing)
2. Mermaid diagram showing mechanism
3. Contributing Factors table
4. Citations from authoritative sources
DO NOT rewrite the entire file.`
})
Terminal window
# Check all risk pages against style guide
node scripts/validate/validate-templates.mjs --type risk
# List risk pages missing required sections
node scripts/content/grade-by-template.mjs --template knowledge-base-risk

Pages are scored on four dimensions (0-10 scale). Scoring is harsh - a 7 is exceptional, most content should score 3-5.

Dimension3-4 (Adequate)5-6 (Good)7+ (Exceptional)
NoveltyAccurate summarySome original framingSignificant original insight
RigorMixed sourcingMostly sourcedFully sourced with quantification
ActionabilityAbstract implicationsSome actionable takeawaysConcrete decision guidance
CompletenessNotable gapsCovers main pointsThorough coverage

Derived quality (0-100) combines subscores with word count and citation bonuses. See CLAUDE.md for formula.


  1. Vague severity claims: “This is very dangerous” → Use specific estimates
  2. Missing mechanism: Don’t just say what, explain HOW
  3. No responses linked: Every risk should connect to potential mitigations
  4. Bullet-heavy: Use tables and prose instead
  5. Table-only sections: Every section needs explanatory paragraphs, not just data tables
  6. No uncertainty acknowledgment: Always include what we don’t know

See Deceptive Alignment for a well-structured example.