AI Capabilities Progress Has Sped Up (Epoch AI Analysis)
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: Epoch AI
Key empirical evidence that AI capability progress accelerated materially in 2024, relevant for forecasting, safety timelines, and understanding the impact of reasoning models and RL-based training on frontier AI development.
Metadata
Summary
Epoch AI finds that frontier AI capabilities have accelerated significantly, with the rate of improvement on the Epoch Capabilities Index nearly doubling from ~8 points/year to ~15.5 points/year after April 2024. This acceleration coincides with the rise of reasoning models and increased focus on reinforcement learning at frontier labs, and is corroborated by a ~50% faster doubling rate in the METR Time Horizon benchmark since October 2024.
Key Points
- •The Epoch Capabilities Index shows a ~90% acceleration in improvement rate starting April 2024, from 8.3 to 15.5 points/year.
- •METR Time Horizons have been doubling ~50% faster since October 2024, corroborating the ECI acceleration finding.
- •The acceleration coincides with the emergence of reasoning models and increased use of reinforcement learning at frontier labs.
- •Analysis spans 149 models over four years (Dec 2021–Dec 2025) using a piecewise linear regression with an optimized breakpoint.
- •Both benchmarks independently point to a structural shift in the pace of AI capability gains around mid-to-late 2024.
Cited by 3 pages
| Page | Type | Quality |
|---|---|---|
| Reasoning and Planning | Capability | 65.0 |
| Scientific Research Capabilities | Capability | 68.0 |
| Epoch AI | Organization | 51.0 |
Cached Content Preview
AI capabilities progress has sped up | Epoch AI
The best score on the Epoch Capabilities Index grew almost twice as fast over the last two years as it did over the two years before that, with a 90% acceleration in April 2024. This is consistent with a similar 2024 acceleration seen in the METR Time Horizon benchmark of 40% in October of 2024. The acceleration roughly coincides with the rise of reasoning models and an increasing focus on reinforcement learning among frontier labs.
Benchmark Epoch Capabilities Index METR Time Horizon Enable JavaScript to see an interactive visualization.
Enable JavaScript to see an interactive visualization.
In order to provide additional data we include pre-2023 models, which are currently filtered out from the Benchmarking Hub due to the comparative sparsity of benchmark scores during that time. Our conclusions do not change substantially if this data is excluded.
Epoch's work is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons BY
license.
Learn more about this graph
Using the Epoch Capabilities Index, we compare simple models of AI capability trends, finding that the best-fitting model features an acceleration in improvement around April 2024. The rate of frontier improvement nearly doubled, from about 8 points/year before the breakpoint to 15 points/year after. HCAST-based estimates of this inflection point yield similar results.
Data
We use data from the Epoch Capabilities Index (ECI), which itself builds off the paper “A Rosetta Stone for AI Benchmarks” . The current implementation of ECI filters out all models released before 2023, and further filters any models with less than four ECI-included benchmark scores. These requirements were relaxed for the purpose of improving this retrospective analysis’ scope, allowing earlier models with sparser data to be included.
Specifically, models before 2023 were permitted to have as few as three ECI-included benchmark scores, though all models estimated to be at the frontier had at least 4 benchmarks. This left only two points in the 2019-2020 range, so the data cutoff was extended by only a year to preserve continuity. The analysis spans 149 models over four years, ranging from December 2021 to December 2025.
We found that re-fitting the ECI with extended data changed the models’ rankings very little despite extra data; the largest jump was Claude 3 Haiku (From 97th to 102nd place) and the third-largest jump was a movement by one place. That is, while some granular score shifts may have happened, we are confident that general trends did not change.
Analysis
Upon obtaining the newly fitted ECI scores, we keep a running maximum in order to determine the model with the highest ECI at each timepoint. This yields 17 frontier points.
Next, we fit a two-segment linear model to frontier ECI vs time, with a single breakpoint and a continuity constra
... (truncated, 7 KB total)663417bdb09208a4 | Stable ID: MDg2NmQ2OW