Longterm Wiki
Back

Credibility Rating

4/5
High(4)

High quality. Established institution or organization with editorial oversight and accountability.

Rating inherited from publication venue: Our World in Data

Data Status

Full text fetchedFetched Dec 28, 2025

Summary

A dataset tracking AI performance across various domains like language understanding, image recognition, and problem-solving. Provides a comparative framework for evaluating AI capabilities relative to human benchmarks.

Key Points

  • Tracks AI performance across 12 different benchmarks from 1998-2023
  • Provides comparative metrics normalizing human and AI capabilities
  • Covers domains including language, image recognition, reasoning, and coding

Review

This source represents a critical compilation of AI benchmark data, systematically tracking the progression of artificial intelligence capabilities across multiple domains. By normalizing human performance as zero and initial AI performance at -100, the dataset offers a nuanced view of technological advancement in areas such as language understanding, image recognition, mathematical reasoning, and code generation. The research is significant for AI safety because it provides empirical evidence of AI systems' evolving capabilities, highlighting both remarkable progress and persistent limitations. Benchmarks like BBH, MMLU, and HumanEval demonstrate AI's growing sophistication in complex reasoning, knowledge application, and problem-solving. However, the varied performance across different domains also underscores the importance of comprehensive evaluation and the need for careful development of AI systems to ensure alignment with human values and capabilities.
Resource ID: 653a55bdf7195c0c | Stable ID: NzI5NjYxN2