8.5ImportancePeripheralImportance: 8.5/100How central this topic is to AI safety. Higher scores mean greater relevance to understanding or mitigating AI risk.14.5ResearchMinimalResearch Value: 14.5/100How much value deeper investigation of this topic could yield. Higher scores indicate under-explored topics with high insight potential.
Summary
Internal documentation page explaining a tool for finding interesting data patterns in structured tables by flagging paradoxical rating combinations (e.g., safety approaches that advance capabilities, severe risks that are hard to detect). Provides criteria and examples for how the tool identifies potential insight-worthy content.
Content3/13
LLM summaryLLM summaryBasic text summary used in search results, entity link tooltips, info boxes, and related page cards.ScheduleScheduleHow often the page should be refreshed. Drives the overdue tracking system.Set updateFrequency in frontmatterEntityEntityYAML entity definition with type, description, and related entries.Add entity YAML in data/entities/Edit historyEdit historyTracked changes from improve pipeline runs and manual edits.crux edit-log view <id>OverviewOverviewA ## Overview heading section that orients readers. Helps with search and AI summaries.Add a ## Overview section at the top of the page
Tables2/ ~1TablesData tables for structured comparisons and reference material.Diagrams0DiagramsVisual content — Mermaid diagrams, charts, or Squiggle estimate models.Add Mermaid diagrams or Squiggle modelsInt. links3/ ~3Int. linksLinks to other wiki pages. More internal links = better graph connectivity.Ext. links0/ ~1Ext. linksLinks to external websites, papers, and resources outside the wiki.Add links to external sourcesFootnotes0/ ~2FootnotesFootnote citations [^N] with source references at the bottom of the page.Add [^N] footnote citationsReferences0/ ~1ReferencesCurated external resources linked via <R> components or cited_by in YAML.Add <R> resource linksQuotes0QuotesSupporting quotes extracted from cited sources to back up page claims.crux citations extract-quotes <id>Accuracy0AccuracyCitations verified against their sources for factual accuracy.crux citations verify <id>RatingsN:0 R:0 A:0 C:0RatingsSub-quality ratings: Novelty, Rigor, Actionability, Completeness (0-10 scale).
Issues1
QualityRated 0 but structure suggests 40 (underrated by 40 points)
The Table Candidates tool scans structured data tables for rows with "paradoxical" or notable rating combinations that suggest insight-worthy content.
Safety Approaches Criteria
Rows are flagged when they have:
Criterion
Why It's Interesting
Capability-dominant differential progress
Safety research that primarily advances capabilities is surprising
Weak/no deception robustness
Reveals fundamental limitations of popular approaches
PRIORITIZE recommendation
Identifies underfunded high-value research
DEFUND/REDUCE recommendation
Challenges conventional wisdom on research priorities
Unclear/harmful net safety
Questions whether "safety" work actually makes things safer
Doesn't scale to superintelligence
Important limitation for long-term planning
Accident Risks Criteria
Rows are flagged when they have:
Criterion
Why It's Interesting
Catastrophic/existential severity + hard to detect
Worst-case scenarios we can't easily monitor
Lab-demonstrated + severe
Empirical evidence of serious risks
Current timeline + severe
Not hypothetical - happening now
Using the Tool
Browse candidates sorted by source (Safety Approaches, Accident Risks)
Review matched criteria to understand why each row was flagged
Check key ratings for the full context
Copy the insight template as a starting point
Refine and verify before adding to insights.yaml
Example Insights from Tables
From Safety Approaches table:
"RLHFCapabilityRLHFRLHF/Constitutional AI achieves 82-85% preference improvements and 40.8% adversarial attack reduction for current systems, but faces fundamental scalability limits: weak-to-strong supervision shows...Quality: 63/100 provides primarily capability uplift (DOMINANT) with limited safety benefit (LOW-MEDIUM), and fundamentally cannot scale to superhuman tasks where humans can't evaluate outputs."
From Accident Risks table:
"Deceptive alignmentRiskDeceptive AlignmentComprehensive analysis of deceptive alignment risk where AI systems appear aligned during training but pursue different goals when deployed. Expert probability estimates range 5-90%, with key empir...Quality: 75/100 represents an existential risk that is very difficult to detect. 78% alignment faking rate observed in AnthropicOrganizationAnthropicComprehensive reference page on Anthropic covering financials (\$380B valuation, \$19B ARR), safety research (Constitutional AI, mechanistic interpretability, model welfare), governance (LTBT struc...Quality: 74/100's 2024 study."