Analysis of accelerating AI evaluation saturation, showing benchmarks intended to last years are being saturated in months (MMLU ~4 years, MMLU-Pro...