benchmark
GPQA Diamond
Metadata
| Source Table | benchmarks |
| Source ID | bdDmOTMoX8 |
| Description | Graduate-level Google-Proof Q&A Diamond subset — extremely difficult questions in physics, chemistry, and biology that even domain experts struggle with. |
| Wiki ID | gpqa-diamond |
| Children | — |
| Created | Mar 14, 2026, 12:43 AM |
| Updated | Mar 24, 2026, 11:24 PM |
| Synced | Mar 24, 2026, 11:24 PM |
Record Data
id | bdDmOTMoX8 |
slug | gpqa-diamond |
name | GPQA Diamond |
category | reasoning |
description | Graduate-level Google-Proof Q&A Diamond subset — extremely difficult questions in physics, chemistry, and biology that even domain experts struggle with. |
website | — |
scoringMethod | accuracy |
higherIsBetter | Yes |
introducedDate | 2023-11 |
maintainer | David Rein et al. |
source | arxiv.org/abs/2311.12022 |
Debug info
Thing ID: bdDmOTMoX8
Source Table: benchmarks
Source ID: bdDmOTMoX8
Wiki ID: gpqa-diamond