IFEval

benchmark

Metadata

Source Table	`benchmarks`
Source ID	`z0wtR8wrVv`
Description	Instruction-Following Evaluation benchmark testing whether LLMs can follow explicit formatting constraints (e.g., 'write exactly 3 paragraphs', 'include these keywords').
Wiki ID	ifeval
Children	—
Created	Mar 14, 2026, 12:43 AM
Updated	Mar 24, 2026, 11:24 PM
Synced	Mar 24, 2026, 11:24 PM

Record Data

`id`	z0wtR8wrVv
`slug`	ifeval
`name`	IFEval
`category`	general
`subCategory`	—
`description`	Instruction-Following Evaluation benchmark testing whether LLMs can follow explicit formatting constraints (e.g., 'write exactly 3 paragraphs', 'include these keywords').
`website`	—
`scoringMethod`	accuracy
`higherIsBetter`	Yes
`introducedDate`	2023-11
`maintainer`	Google Research
`source`	arxiv.org/abs/2311.07911

Debug info

Thing ID: z0wtR8wrVv

Source Table: benchmarks

Source ID: z0wtR8wrVv

Wiki ID: ifeval