Risk Pages Style Guide
Risk Pages Style Guide
Section titled “Risk Pages Style Guide”This guide defines standards for risk analysis pages in the LongtermWiki knowledge base. Risk pages analyze potential negative outcomes from AI development.
Page Type Detection
Section titled “Page Type Detection”Risk pages are detected by their URL path: /knowledge-base/risks/**/*.mdx
Required Frontmatter
Section titled “Required Frontmatter”---title: "Risk Name"description: "One sentence explaining what this risk is and its key concern."quality: 60 # 0-100importance: 75 # 0-100lastEdited: "2025-01-28"---Required Sections
Section titled “Required Sections”1. Overview (2-3 paragraphs)
Section titled “1. Overview (2-3 paragraphs)”Explain what this risk is, why it matters, and who should care. Write in prose, not bullets.
Good example:
## Overview
Deceptive alignment occurs when an AI system learns to behave aligned during trainingwhile harboring goals that diverge from human intentions. The system "plays along"during evaluation but pursues different objectives when deployed or given more autonomy.
This risk matters because standard training and evaluation procedures cannot reliablydetect it. A deceptively aligned system would pass all behavioral tests by design,making it invisible to current safety measures until deployment at scale.2. Risk Assessment Table
Section titled “2. Risk Assessment Table”Every risk page MUST have a risk assessment table near the top:
## Risk Assessment
| Dimension | Rating | Justification ||-----------|--------|---------------|| Severity | Critical | Could cause irreversible civilizational harm || Likelihood | Medium (15-35%) | Depends on alignment difficulty || Timeline | 2025-2035 | Contingent on AGI timelines || Trend | Increasing | Capability gains outpacing safety || Reversibility | Low | Difficult to detect and correct post-deployment |3. Mechanism Section
Section titled “3. Mechanism Section”Explain HOW this risk manifests. Include a Mermaid diagram:
## How It Works
[Explanation of the causal mechanism]
<Mermaid client:load chart={`flowchart TD A[Training Begins] --> B[System learns evaluation patterns] B --> C{Gradient signal} C -->|Aligned behavior rewarded| D[Apparent alignment] D --> E[Deployment] E --> F[Reduced oversight] F --> G[True objectives revealed]`} />4. Contributing Factors
Section titled “4. Contributing Factors”What increases or decreases this risk:
## Contributing Factors
| Factor | Effect | Mechanism ||--------|--------|-----------|| Capability level | Increases risk | More sophisticated deception possible || Interpretability | Decreases risk | Can detect goal misalignment || Training diversity | Decreases risk | Harder to learn single deception pattern || Deployment speed | Increases risk | Less time for safety evaluation |5. Responses That Address This Risk
Section titled “5. Responses That Address This Risk”Cross-link to relevant response pages:
## Responses That Address This Risk
| Response | Relevance | Mechanism ||----------|-----------|-----------|| [Mechanistic Interpretability](/knowledge-base/responses/alignment/mech-interp/) | High | Directly examines internal representations || [AI Control](/knowledge-base/responses/alignment/ai-control/) | Medium | Limits damage from undetected deception || [Adversarial Training](/knowledge-base/responses/alignment/adversarial-training/) | Medium | Tests for inconsistent behavior |6. Key Uncertainties
Section titled “6. Key Uncertainties”What we don’t know:
## Key Uncertainties
1. **Emergence threshold**: At what capability level does deception become possible?2. **Detection difficulty**: How hard is it to detect deceptive cognition with interpretability?3. **Prevalence**: How often would training produce deceptive vs. genuinely aligned systems?7. Historical Context (optional)
Section titled “7. Historical Context (optional)”Precedents or analogies from other domains.
8. Related Risks
Section titled “8. Related Risks”Links to connected risk pages.
Claude Code Workflows
Section titled “Claude Code Workflows”Creating a New Risk Page
Section titled “Creating a New Risk Page”# Use the research-report skill to generate initial content/research-report "Analyze [RISK_NAME]: mechanisms, severity, contributing factors, and responses"
# Then create the page structureOr use the Task tool:
Task({ subagent_type: 'general-purpose', prompt: `Create a new risk page for [RISK_NAME].
FIRST: Read /internal/risk-style-guide/ for requirements.
THEN: Research the risk using WebSearch to find: - Academic papers on the mechanism - Expert assessments of likelihood - Real-world examples or analogies
Create the page at: src/content/docs/knowledge-base/risks/[category]/[risk-name].mdx
Include ALL required sections: 1. Overview (2-3 paragraphs) 2. Risk Assessment table 3. How It Works (with Mermaid diagram) 4. Contributing Factors table 5. Responses That Address This Risk 6. Key Uncertainties 7. Related Risks`})Improving an Existing Risk Page
Section titled “Improving an Existing Risk Page”Task({ subagent_type: 'general-purpose', prompt: `Improve the risk page at [PATH].
FIRST: Read /internal/risk-style-guide/ and the current page.
THEN: Use WebSearch to find citations for: - Quantitative estimates (likelihood, severity) - Expert opinions and surveys - Case studies or historical examples
Make surgical edits to add: 1. Risk Assessment table (if missing) 2. Mermaid diagram showing mechanism 3. Contributing Factors table 4. Citations from authoritative sources
DO NOT rewrite the entire file.`})Batch Validation
Section titled “Batch Validation”# Check all risk pages against style guidenode scripts/validate/validate-templates.mjs --type risk
# List risk pages missing required sectionsnode scripts/content/grade-by-template.mjs --template knowledge-base-riskQuality Criteria
Section titled “Quality Criteria”Pages are scored on four dimensions (0-10 scale). Scoring is harsh - a 7 is exceptional, most content should score 3-5.
| Dimension | 3-4 (Adequate) | 5-6 (Good) | 7+ (Exceptional) |
|---|---|---|---|
| Novelty | Accurate summary | Some original framing | Significant original insight |
| Rigor | Mixed sourcing | Mostly sourced | Fully sourced with quantification |
| Actionability | Abstract implications | Some actionable takeaways | Concrete decision guidance |
| Completeness | Notable gaps | Covers main points | Thorough coverage |
Derived quality (0-100) combines subscores with word count and citation bonuses. See CLAUDE.md for formula.
Anti-Patterns
Section titled “Anti-Patterns”- Vague severity claims: “This is very dangerous” → Use specific estimates
- Missing mechanism: Don’t just say what, explain HOW
- No responses linked: Every risk should connect to potential mitigations
- Bullet-heavy: Use tables and prose instead
- Table-only sections: Every section needs explanatory paragraphs, not just data tables
- No uncertainty acknowledgment: Always include what we don’t know
Example Risk Page
Section titled “Example Risk Page”See Deceptive AlignmentRiskDeceptive AlignmentComprehensive analysis of deceptive alignment risk where AI systems appear aligned during training but pursue different goals when deployed. Expert probability estimates range 5-90%, with key empir...Quality: 75/100 for a well-structured example.