Longterm Wiki
Updated 2026-03-13HistoryData
Page StatusContent
Edited today47 words
8.5ImportancePeripheral14.5ResearchMinimal
Summary

Internal documentation page explaining a tool for finding interesting data patterns in structured tables by flagging paradoxical rating combinations (e.g., safety approaches that advance capabilities, severe risks that are hard to detect). Provides criteria and examples for how the tool identifies potential insight-worthy content.

Content3/13
LLM summaryScheduleEntityEdit historyOverview
Tables2/ ~1Diagrams0Int. links3/ ~3Ext. links0/ ~1Footnotes0/ ~2References0/ ~1Quotes0Accuracy0RatingsN:0 R:0 A:0 C:0
Issues1
QualityRated 0 but structure suggests 40 (underrated by 40 points)

Table Candidates

Open Table Candidates ToolBrowse table rows flagged as potential insight sources

How It Works

The Table Candidates tool scans structured data tables for rows with "paradoxical" or notable rating combinations that suggest insight-worthy content.

Safety Approaches Criteria

Rows are flagged when they have:

CriterionWhy It's Interesting
Capability-dominant differential progressSafety research that primarily advances capabilities is surprising
Weak/no deception robustnessReveals fundamental limitations of popular approaches
PRIORITIZE recommendationIdentifies underfunded high-value research
DEFUND/REDUCE recommendationChallenges conventional wisdom on research priorities
Unclear/harmful net safetyQuestions whether "safety" work actually makes things safer
Doesn't scale to superintelligenceImportant limitation for long-term planning

Accident Risks Criteria

Rows are flagged when they have:

CriterionWhy It's Interesting
Catastrophic/existential severity + hard to detectWorst-case scenarios we can't easily monitor
Lab-demonstrated + severeEmpirical evidence of serious risks
Current timeline + severeNot hypothetical - happening now

Using the Tool

  1. Browse candidates sorted by source (Safety Approaches, Accident Risks)
  2. Review matched criteria to understand why each row was flagged
  3. Check key ratings for the full context
  4. Copy the insight template as a starting point
  5. Refine and verify before adding to insights.yaml

Example Insights from Tables

From Safety Approaches table:

"RLHF provides primarily capability uplift (DOMINANT) with limited safety benefit (LOW-MEDIUM), and fundamentally cannot scale to superhuman tasks where humans can't evaluate outputs."

From Accident Risks table:

"Deceptive alignment represents an existential risk that is very difficult to detect. 78% alignment faking rate observed in Anthropic's 2024 study."

Related Pages

Top Related Pages