Longterm Wiki
Updated 2026-03-13HistoryData
Page StatusDocumentation
Edited today874 words4 backlinksUpdated bimonthlyDue in 9 weeks
48QualityAdequate •11ImportancePeripheral10ResearchMinimal
Summary

This page documents LongtermWiki's content rating system, which combines LLM-graded subscores (focus, novelty, rigor, completeness, objectivity, concreteness, actionability on 0-10 scales) with automated metrics (word count, citations) to derive quality scores (0-100) and importance ratings for prioritization decisions.

Content6/13
LLM summaryScheduleEntityEdit history2Overview
Tables10/ ~3Diagrams1Int. links8/ ~7Ext. links1/ ~4Footnotes0/ ~3References0/ ~3Quotes0Accuracy0RatingsN:2 R:6.5 A:7 C:8Backlinks4
Change History2
Add PageStatus and info boxes to internal pages#1853 weeks ago

Enabled PageStatus rendering, Data links, and Feedback widgets on internal pages by removing the `isInternal` guards in the wiki page renderer. Added `evergreen`, `update_frequency`, and `lastEdited` frontmatter to all ~40 internal pages so update schedules and staleness indicators are visible.

Internal pages entity infrastructure#1424 weeks ago

Added full entity infrastructure to internal pages (style guides, architecture docs, research reports, schema docs). Internal pages now have the `internal` entity type, get auto-assigned E* numeric IDs (E698-E731), are included in the search index, and participate in backlinks/related graph computation. Includes review fixes: filtering internal pages from public explore/home, converting all 7 remaining .md files, adding `internal` to data/schema.ts, and updating all `shouldSkipValidation`/`pageType === 'documentation'` checks.

Issues1
QualityRated 48 but structure suggests 80 (underrated by 32 points)

Rating System

LongtermWiki uses a multi-dimensional rating system combining LLM-graded subscores with automated metrics to produce a derived quality score (0-100).

Quick Reference

# Grade a single page
ANTHROPIC_API_KEY=sk-... node scripts/content/grade-content.mjs --page scheming

# Grade all pages (with cost estimate)
ANTHROPIC_API_KEY=sk-... node scripts/content/grade-content.mjs --dry-run

# Grade and apply to frontmatter
ANTHROPIC_API_KEY=sk-... node scripts/content/grade-content.mjs --apply

# Parallel processing (faster, higher API cost)
ANTHROPIC_API_KEY=sk-... node scripts/content/grade-content.mjs --parallel 5 --apply

Score Components

1. Importance (0-100)

How significant is this page for AI risk prioritization work?

RangeDescriptionExpected Count
90-100Essential for prioritization decisions5-10 pages
70-89High value for practitioners30-50 pages
50-69Useful context80-100 pages
30-49Reference material60-80 pages
0-29Peripheral30-50 pages

Category adjustments applied to base assessment:

  • Responses/interventions: +10
  • Capabilities: +5
  • Core risks: +5
  • Risk factors: 0
  • Models/analysis: -5
  • Arguments/debates: -10
  • People/organizations: -15
  • Internal/infrastructure: -30

2. Quality Subscores (0-10 each)

Scoring is harsh: a 7 is exceptional, 8+ is world-class. Most content should score 3-5.

Novelty

How original is the content beyond its sources?

ScoreMeaning
9-10Groundbreaking original research (academic publication level)
7-8Significant original synthesis not found elsewhere
5-6Some original framing, modest value beyond sources
3-4Accurate summary with minimal original perspective
1-2Mostly restates common knowledge

Rigor

How well-evidenced and precise are the claims?

ScoreMeaning
9-10Every claim sourced to primary sources, quantified with uncertainty
7-8Nearly all claims well-sourced and quantified
5-6Most major claims sourced, some quantification
3-4Mix of sourced and unsourced, vague claims common
1-2Few sources, mostly assertions

Actionability

How useful is this for making decisions?

ScoreMeaning
9-10Specific decision procedures with quantified tradeoffs
7-8Clear concrete recommendations with supporting analysis
5-6Some actionable takeaways, general guidance
3-4Mostly abstract, implications unclear
1-2Purely descriptive, no practical application

Completeness

How comprehensive is the coverage?

ScoreMeaning
9-10Exhaustive authoritative reference (textbook-level)
7-8Covers all major aspects thoroughly with depth
5-6Covers main points, some gaps
3-4Notable gaps, missing important aspects
1-2Very incomplete, barely started

Objectivity

Epistemic honesty, language neutrality, and analytical (not prescriptive) tone. See Common Writing Principles for full guidance.

ScoreMeaning
9-10Every uncertain claim hedged with ranges; fully accessible to outsiders; presents tradeoffs without advocating
7-8Nearly all estimates include ranges; no insider jargon; analytical throughout; honest counter-arguments
5-6Mostly neutral language; some uncertainty acknowledged; mostly analytical
3-4Uses insider jargon ("EA money"); presents estimates as facts ("True Cost"); one-sided framing
1-2Heavy insider language; false certainty; reads as advocacy not analysis

Concreteness

How specific vs. abstract?

ScoreMeaning
9-10Specific numbers, examples, recommendations throughout
7-8Mostly concrete with specific details
5-6Mix of concrete and abstract
3-4Mostly abstract, vague generalities
1-2Almost entirely abstract hand-waving

3. Automated Metrics

These are computed directly from content, not LLM-graded:

MetricWhat It MeasuresHow Computed
wordCountProse words (excluding tables)Strip tables, code blocks, imports, components
citationsExternal sourcesCount <R id=...> + markdown links [](https://...)
tablesData tablesCount |---| patterns
diagramsVisual elementsCount <Mermaid> + ![](...) images

4. Derived Quality Score (0-100)

quality = (avgSubscore × 8) + lengthBonus + evidenceBonus

Where:

  • avgSubscore = weighted average of all 7 dimensions (focus, novelty, rigor, completeness, objectivity, concreteness, actionability) → contributes 0-80. Weights vary by content type — analysis pages weight focus, novelty, objectivity, concreteness higher; reference pages weight rigor, completeness higher.
  • lengthBonus = min(8, wordCount / 600) → contributes 0-8
  • evidenceBonus = min(7, citations × 0.35) → contributes 0-7

Subscores are the primary driver (~80% of score). Bonuses reward depth but can't compensate for weak content.

Quality RangeLabelMeaning
80-100ComprehensiveFully developed, authoritative
60-79GoodSolid content, minor gaps
40-59AdequateUseful but needs work
20-39DraftEarly stage, significant gaps
0-19StubPlaceholder only

Frontmatter Schema

After grading, pages have this frontmatter structure:

---
title: "Page Title"
description: "Executive summary with methodology AND conclusions"
quality: 65           # Derived 0-100
readerImportance: 75        # LLM-assessed 0-100
lastEdited: "2026-01-28"
ratings:
  focus: 6.0          # 0-10 scale
  novelty: 4.5
  rigor: 5.2
  completeness: 5.0
  objectivity: 5.5
  concreteness: 4.8
  actionability: 4.8
llmSummary: "This page analyzes X using Y methodology. It finds that Z with N% probability."
---

Script Options

node scripts/content/grade-content.mjs [options]

Options:
  --page ID       Grade single page by ID or partial match
  --dry-run       Preview without API calls
  --limit N       Process only N pages
  --parallel N    Concurrent API requests (default: 1)
  --category X    Filter by category (models, risks, responses)
  --skip-graded   Skip pages with existing importance
  --output FILE   JSON output path (default: .claude/temp/grades-output.json)
  --apply         Write grades directly to frontmatter

Cost Estimates

ScenarioInput TokensOutput TokensCost
Single page≈4K≈200≈$1.05
All 300 pages≈1.2M≈60K≈$15
10 pages parallel≈40K≈2K≈$1.50

Validation

Pages are validated against quality criteria based on their type:

npm run crux -- validate templates    # Template structure
npm run crux -- validate unified --rules=placeholders  # Incomplete content

See Page Types for which pages are validated.


  • Common Writing Principles - Cross-cutting writing standards (objectivity dimension)
  • Models Style Guide - Requirements for analytical model pages
  • Risk Style Guide - Requirements for risk analysis pages
  • Response Style Guide - Requirements for intervention pages
  • Page Types - How page types affect validation