Capability Elicitation

capability-elicitationapproachPath: /knowledge-base/responses/capability-elicitation/

E443Entity ID (EID)

← Back to page6 backlinksQuality: 91Updated: 2026-01-30

Page Recorddatabase.json — merged from MDX frontmatter + Entity YAML + computed metrics at build time

{
  "id": "capability-elicitation",
  "wikiId": "E443",
  "path": "/knowledge-base/responses/capability-elicitation/",
  "filePath": "knowledge-base/responses/capability-elicitation.mdx",
  "title": "Capability Elicitation",
  "quality": 91,
  "readerImportance": 50,
  "researchImportance": 81,
  "tacticalValue": null,
  "contentFormat": "article",
  "causalLevel": null,
  "lastUpdated": "2026-01-30",
  "dateCreated": "2026-02-15",
  "summary": "Capability elicitation—systematically discovering what AI models can actually do through scaffolding, prompting, and fine-tuning—reveals 2-10x performance gaps versus naive testing. METR finds AI agent capability doubles every 7 months when properly elicited; UK AISI found cyber task performance improved 5x in one year; fine-tuning can remove safety with just 10-340 examples. However, sandbagging research shows capable models may intentionally hide capabilities during evaluation—Claude 3.5 Sonnet accuracy drops from 99% to 34% when incentivized to underperform. OpenAI-Apollo partnership achieved ~30x reduction in scheming through deliberative alignment training.",
  "description": "Systematic methods to discover what AI models can actually do, including hidden capabilities that may not appear in standard benchmarks, through scaffolding, fine-tuning, and specialized prompting techniques.",
  "ratings": {
    "novelty": 6.2,
    "rigor": 7.8,
    "completeness": 8,
    "actionability": 7
  },
  "category": "responses",
  "subcategory": "alignment-evaluation",
  "clusters": [
    "ai-safety",
    "governance"
  ],
  "metrics": {
    "wordCount": 3463,
    "tableCount": 18,
    "diagramCount": 4,
    "internalLinks": 5,
    "externalLinks": 72,
    "footnoteCount": 0,
    "bulletRatio": 0.17,
    "sectionCount": 37,
    "hasOverview": true,
    "structuralScore": 15
  },
  "suggestedQuality": 100,
  "updateFrequency": 21,
  "evergreen": true,
  "wordCount": 3463,
  "unconvertedLinks": [
    {
      "text": "Apollo Research",
      "url": "https://www.apolloresearch.ai/research/scheming-reasoning-evaluations",
      "resourceId": "91737bf431000298",
      "resourceTitle": "Frontier Models are Capable of In-Context Scheming"
    },
    {
      "text": "METR",
      "url": "https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/",
      "resourceId": "271fc5f73a8304b2",
      "resourceTitle": "Measuring AI Ability to Complete Long Tasks - METR"
    },
    {
      "text": "UK AISI Frontier AI Trends Report",
      "url": "https://www.aisi.gov.uk/frontier-ai-trends-report",
      "resourceId": "7042c7f8de04ccb1",
      "resourceTitle": "AISI Frontier AI Trends"
    },
    {
      "text": "UK AISI",
      "url": "https://www.aisi.gov.uk/blog/5-key-findings-from-our-first-frontier-ai-trends-report",
      "resourceId": "8a9de448c7130623",
      "resourceTitle": "nearly 5x more likely"
    },
    {
      "text": "METR o1/Sonnet Evaluation",
      "url": "https://metr.org/blog/2025-01-31-update-sonnet-o1-evals/",
      "resourceId": "89b92e6423256fc4",
      "resourceTitle": "METR Capability Evaluations Update: Claude Sonnet and OpenAI o1"
    },
    {
      "text": "Wei et al. (2022)",
      "url": "https://arxiv.org/abs/2201.11903",
      "resourceId": "7d42a191f4b30946",
      "resourceTitle": "Chain-of-thought analysis"
    },
    {
      "text": "METR RE-Bench",
      "url": "https://metr.org/blog/2024-11-22-evaluating-r-d-capabilities-of-llms/",
      "resourceId": "056e0ff33675b825",
      "resourceTitle": "RE-Bench: Evaluating frontier AI R&D capabilities"
    },
    {
      "text": "UK AISI",
      "url": "https://www.aisi.gov.uk/frontier-ai-trends-report",
      "resourceId": "7042c7f8de04ccb1",
      "resourceTitle": "AISI Frontier AI Trends"
    },
    {
      "text": "METR",
      "url": "https://metr.org/",
      "resourceId": "45370a5153534152",
      "resourceTitle": "METR: Model Evaluation and Threat Research"
    },
    {
      "text": "Apollo Research",
      "url": "https://www.apolloresearch.ai/",
      "resourceId": "329d8c2e2532be3d",
      "resourceTitle": "Apollo Research - AI Safety Evaluation Organization"
    },
    {
      "text": "Wei et al. 2022",
      "url": "https://arxiv.org/abs/2201.11903",
      "resourceId": "7d42a191f4b30946",
      "resourceTitle": "Chain-of-thought analysis"
    },
    {
      "text": "Elicitation effort doubles effective capability",
      "url": "https://metr.org/blog/2025-01-31-update-sonnet-o1-evals/",
      "resourceId": "89b92e6423256fc4",
      "resourceTitle": "METR Capability Evaluations Update: Claude Sonnet and OpenAI o1"
    },
    {
      "text": "Model-specific elicitation needed",
      "url": "https://evaluations.metr.org/deepseek-v3-report/",
      "resourceId": "399923cddbcace4a",
      "resourceTitle": "Details about METR's preliminary evaluation of DeepSeek-V3"
    },
    {
      "text": "Exponential growth since 2019",
      "url": "https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/",
      "resourceId": "271fc5f73a8304b2",
      "resourceTitle": "Measuring AI Ability to Complete Long Tasks - METR"
    },
    {
      "text": "RE-Bench benchmark",
      "url": "https://metr.org/blog/2024-11-22-evaluating-r-d-capabilities-of-llms/",
      "resourceId": "056e0ff33675b825",
      "resourceTitle": "RE-Bench: Evaluating frontier AI R&D capabilities"
    },
    {
      "text": "Frontier AI Trends Report",
      "url": "https://www.aisi.gov.uk/frontier-ai-trends-report",
      "resourceId": "7042c7f8de04ccb1",
      "resourceTitle": "AISI Frontier AI Trends"
    },
    {
      "text": "Tasks requiring 1-3 years experience",
      "url": "https://www.aisi.gov.uk/blog/5-key-findings-from-our-first-frontier-ai-trends-report",
      "resourceId": "8a9de448c7130623",
      "resourceTitle": "nearly 5x more likely"
    },
    {
      "text": "Responsible Scaling Policy",
      "url": "https://www.anthropic.com/responsible-scaling-policy",
      "resourceId": "afe1e125f3ba3f14",
      "resourceTitle": "Responsible Scaling Policy"
    },
    {
      "text": "notes",
      "url": "https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf",
      "resourceId": "135450f83343d9ae",
      "resourceTitle": "Anthropic Responsible Scaling Policy 2024 10 15"
    },
    {
      "text": "Guidelines for Capability Elicitation",
      "url": "https://evaluations.metr.org/elicitation-protocol/",
      "resourceId": "e544cb0c92c340b3",
      "resourceTitle": "Guidelines for capability elicitation | METR’s Autonomy Evaluation Resources"
    },
    {
      "text": "Apollo Research",
      "url": "https://www.apolloresearch.ai/research/scheming-reasoning-evaluations",
      "resourceId": "91737bf431000298",
      "resourceTitle": "Frontier Models are Capable of In-Context Scheming"
    },
    {
      "text": "Apollo (2024)",
      "url": "https://www.apolloresearch.ai/research/scheming-reasoning-evaluations",
      "resourceId": "91737bf431000298",
      "resourceTitle": "Frontier Models are Capable of In-Context Scheming"
    },
    {
      "text": "Apollo (2025)",
      "url": "https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming/",
      "resourceId": "80c6d6eca17dc925",
      "resourceTitle": "More capable models scheme at higher rates"
    },
    {
      "text": "Apollo (2025)",
      "url": "https://www.apolloresearch.ai/blog/claude-sonnet-37-often-knows-when-its-in-alignment-evaluations/",
      "resourceId": "f5ef9e486e36fbee",
      "resourceTitle": "Apollo Research found"
    },
    {
      "text": "OpenAI-Apollo (2025)",
      "url": "https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/",
      "resourceId": "b3f335edccfc5333",
      "resourceTitle": "OpenAI Preparedness Framework"
    },
    {
      "text": "Apollo notes",
      "url": "https://www.apolloresearch.ai/blog/claude-sonnet-37-often-knows-when-its-in-alignment-evaluations/",
      "resourceId": "f5ef9e486e36fbee",
      "resourceTitle": "Apollo Research found"
    },
    {
      "text": "METR",
      "url": "https://metr.org/",
      "resourceId": "45370a5153534152",
      "resourceTitle": "METR: Model Evaluation and Threat Research"
    },
    {
      "text": "UK AISI",
      "url": "https://www.aisi.gov.uk/",
      "resourceId": "fdf68a8f30f57dee",
      "resourceTitle": "UK AI Safety Institute (AISI)"
    },
    {
      "text": "US AISI",
      "url": "https://www.nist.gov/aisi",
      "resourceId": "84e0da6d5092e27d",
      "resourceTitle": "Center for AI Standards and Innovation (CAISI)"
    },
    {
      "text": "Apollo Research",
      "url": "https://www.apolloresearch.ai/",
      "resourceId": "329d8c2e2532be3d",
      "resourceTitle": "Apollo Research - AI Safety Evaluation Organization"
    },
    {
      "text": "Anthropic",
      "url": "https://www.anthropic.com/",
      "resourceId": "ef96dc6d7727d718",
      "resourceTitle": "Anthropic"
    },
    {
      "text": "OpenAI",
      "url": "https://openai.com/",
      "resourceId": "04d39e8bd5d50dd5",
      "resourceTitle": "OpenAI Official Homepage"
    },
    {
      "text": "Measuring AI Ability to Complete Long Tasks",
      "url": "https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/",
      "resourceId": "271fc5f73a8304b2",
      "resourceTitle": "Measuring AI Ability to Complete Long Tasks - METR"
    },
    {
      "text": "RE-Bench: Evaluating frontier AI R&D capabilities",
      "url": "https://metr.org/blog/2024-11-22-evaluating-r-d-capabilities-of-llms/",
      "resourceId": "056e0ff33675b825",
      "resourceTitle": "RE-Bench: Evaluating frontier AI R&D capabilities"
    },
    {
      "text": "Update on evaluations of Claude 3.5 Sonnet and o1",
      "url": "https://metr.org/blog/2025-01-31-update-sonnet-o1-evals/",
      "resourceId": "89b92e6423256fc4",
      "resourceTitle": "METR Capability Evaluations Update: Claude Sonnet and OpenAI o1"
    },
    {
      "text": "Guidelines for Capability Elicitation",
      "url": "https://evaluations.metr.org/elicitation-protocol/",
      "resourceId": "e544cb0c92c340b3",
      "resourceTitle": "Guidelines for capability elicitation | METR’s Autonomy Evaluation Resources"
    },
    {
      "text": "Frontier AI Trends Report",
      "url": "https://www.aisi.gov.uk/frontier-ai-trends-report",
      "resourceId": "7042c7f8de04ccb1",
      "resourceTitle": "AISI Frontier AI Trends"
    },
    {
      "text": "5 Key Findings",
      "url": "https://www.aisi.gov.uk/blog/5-key-findings-from-our-first-frontier-ai-trends-report",
      "resourceId": "8a9de448c7130623",
      "resourceTitle": "nearly 5x more likely"
    },
    {
      "text": "Advanced AI Evaluations: May Update",
      "url": "https://www.aisi.gov.uk/blog/advanced-ai-evaluations-may-update",
      "resourceId": "4e56cdf6b04b126b",
      "resourceTitle": "UK AI Safety Institute renamed to AI Security Institute"
    },
    {
      "text": "Early Lessons from Evaluating Frontier AI Systems",
      "url": "https://www.aisi.gov.uk/blog/early-lessons-from-evaluating-frontier-ai-systems",
      "resourceId": "0fd3b1f5c81a37d8",
      "resourceTitle": "UK AI Security Institute's evaluations"
    },
    {
      "text": "2025 Year in Review",
      "url": "https://www.aisi.gov.uk/blog/our-2025-year-in-review",
      "resourceId": "3dec5f974c5da5ec",
      "resourceTitle": "Our 2025 Year in Review"
    },
    {
      "text": "Frontier Models are Capable of In-Context Scheming",
      "url": "https://www.apolloresearch.ai/research/scheming-reasoning-evaluations",
      "resourceId": "91737bf431000298",
      "resourceTitle": "Frontier Models are Capable of In-Context Scheming"
    },
    {
      "text": "More Capable Models Are Better At In-Context Scheming",
      "url": "https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming/",
      "resourceId": "80c6d6eca17dc925",
      "resourceTitle": "More capable models scheme at higher rates"
    },
    {
      "text": "Claude Sonnet 3.7 Evaluation Awareness",
      "url": "https://www.apolloresearch.ai/blog/claude-sonnet-37-often-knows-when-its-in-alignment-evaluations/",
      "resourceId": "f5ef9e486e36fbee",
      "resourceTitle": "Apollo Research found"
    },
    {
      "text": "OpenAI Partnership: Detecting and Reducing Scheming",
      "url": "https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/",
      "resourceId": "b3f335edccfc5333",
      "resourceTitle": "OpenAI Preparedness Framework"
    },
    {
      "text": "Anthropic Responsible Scaling Policy",
      "url": "https://www.anthropic.com/responsible-scaling-policy",
      "resourceId": "afe1e125f3ba3f14",
      "resourceTitle": "Responsible Scaling Policy"
    },
    {
      "text": "Anthropic RSP PDF (October 2024)",
      "url": "https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf",
      "resourceId": "135450f83343d9ae",
      "resourceTitle": "Anthropic Responsible Scaling Policy 2024 10 15"
    },
    {
      "text": "OpenAI Detecting and Reducing Scheming",
      "url": "https://openai.com/index/detecting-and-reducing-scheming-in-ai-models/",
      "resourceId": "b3f335edccfc5333",
      "resourceTitle": "OpenAI Preparedness Framework"
    },
    {
      "text": "Chain-of-Thought Prompting Elicits Reasoning",
      "url": "https://arxiv.org/abs/2201.11903",
      "resourceId": "7d42a191f4b30946",
      "resourceTitle": "Chain-of-thought analysis"
    },
    {
      "text": "METR",
      "url": "https://metr.org/",
      "resourceId": "45370a5153534152",
      "resourceTitle": "METR: Model Evaluation and Threat Research"
    },
    {
      "text": "UK AI Security Institute",
      "url": "https://www.aisi.gov.uk/",
      "resourceId": "fdf68a8f30f57dee",
      "resourceTitle": "UK AI Safety Institute (AISI)"
    },
    {
      "text": "Apollo Research",
      "url": "https://www.apolloresearch.ai/",
      "resourceId": "329d8c2e2532be3d",
      "resourceTitle": "Apollo Research - AI Safety Evaluation Organization"
    },
    {
      "text": "Anthropic",
      "url": "https://www.anthropic.com/",
      "resourceId": "ef96dc6d7727d718",
      "resourceTitle": "Anthropic"
    },
    {
      "text": "OpenAI",
      "url": "https://openai.com/",
      "resourceId": "04d39e8bd5d50dd5",
      "resourceTitle": "OpenAI Official Homepage"
    }
  ],
  "unconvertedLinkCount": 54,
  "convertedLinkCount": 0,
  "backlinkCount": 6,
  "hallucinationRisk": {
    "level": "low",
    "score": 25,
    "factors": [
      "no-citations",
      "high-rigor",
      "conceptual-content",
      "high-quality"
    ]
  },
  "entityType": "approach",
  "redundancy": {
    "maxSimilarity": 23,
    "similarPages": [
      {
        "id": "dangerous-cap-evals",
        "title": "Dangerous Capability Evaluations",
        "path": "/knowledge-base/responses/dangerous-cap-evals/",
        "similarity": 23
      },
      {
        "id": "alignment-evals",
        "title": "Alignment Evaluations",
        "path": "/knowledge-base/responses/alignment-evals/",
        "similarity": 19
      },
      {
        "id": "model-auditing",
        "title": "Third-Party Model Auditing",
        "path": "/knowledge-base/responses/model-auditing/",
        "similarity": 19
      },
      {
        "id": "evals",
        "title": "Evals & Red-teaming",
        "path": "/knowledge-base/responses/evals/",
        "similarity": 18
      },
      {
        "id": "sleeper-agent-detection",
        "title": "Sleeper Agent Detection",
        "path": "/knowledge-base/responses/sleeper-agent-detection/",
        "similarity": 18
      }
    ]
  },
  "coverage": {
    "passing": 8,
    "total": 13,
    "targets": {
      "tables": 14,
      "diagrams": 1,
      "internalLinks": 28,
      "externalLinks": 17,
      "footnotes": 10,
      "references": 10
    },
    "actuals": {
      "tables": 18,
      "diagrams": 4,
      "internalLinks": 5,
      "externalLinks": 72,
      "footnotes": 0,
      "references": 23,
      "quotesWithQuotes": 0,
      "quotesTotal": 0,
      "accuracyChecked": 0,
      "accuracyTotal": 0
    },
    "items": {
      "summary": "green",
      "schedule": "green",
      "entity": "green",
      "editHistory": "red",
      "overview": "green",
      "tables": "green",
      "diagrams": "green",
      "internalLinks": "amber",
      "externalLinks": "green",
      "footnotes": "red",
      "references": "green",
      "quotes": "red",
      "accuracy": "red"
    },
    "ratingsString": "N:6.2 R:7.8 A:7 C:8"
  },
  "readerRank": 304,
  "researchRank": 81,
  "recommendedScore": 215.13
}

External Links

{
  "lesswrong": "https://www.lesswrong.com/tag/ai-evaluations"
}

Backlinks (6)

id	title	type	relationship
arc	Alignment Research Center (ARC)	organization	—
metr	METR	organization	—
palisade-research	Palisade Research	organization	—
redwood-research	Redwood Research	organization	—
dario-amodei	Dario Amodei	person	—
alignment-evaluation-overview	Evaluation & Detection (Overview)	concept	—