benchmark
IFEval
Metadata
| Source Table | benchmarks |
| Source ID | z0wtR8wrVv |
| Description | Instruction-Following Evaluation benchmark testing whether LLMs can follow explicit formatting constraints (e.g., 'write exactly 3 paragraphs', 'include these keywords'). |
| Wiki ID | ifeval |
| Children | — |
| Created | Mar 14, 2026, 12:43 AM |
| Updated | Mar 24, 2026, 11:24 PM |
| Synced | Mar 24, 2026, 11:24 PM |
Record Data
id | z0wtR8wrVv |
slug | ifeval |
name | IFEval |
category | general |
description | Instruction-Following Evaluation benchmark testing whether LLMs can follow explicit formatting constraints (e.g., 'write exactly 3 paragraphs', 'include these keywords'). |
website | — |
scoringMethod | accuracy |
higherIsBetter | Yes |
introducedDate | 2023-11 |
maintainer | Google Research |
source | arxiv.org/abs/2311.07911 |
Debug info
Thing ID: z0wtR8wrVv
Source Table: benchmarks
Source ID: z0wtR8wrVv
Wiki ID: ifeval