benchmark
BrowseComp
Metadata
| Source Table | benchmarks |
| Source ID | 6A4fafVF2n |
| Description | A benchmark evaluating AI systems' ability to find hard-to-locate information on the web, testing browsing, search, and information synthesis capabilities across difficult queries. |
| Wiki ID | browsecomp |
| Children | — |
| Created | Mar 24, 2026, 11:23 PM |
| Updated | Mar 24, 2026, 11:24 PM |
| Synced | Mar 24, 2026, 11:24 PM |
Record Data
id | 6A4fafVF2n |
slug | browsecomp |
name | BrowseComp |
category | agentic |
description | A benchmark evaluating AI systems' ability to find hard-to-locate information on the web, testing browsing, search, and information synthesis capabilities across difficult queries. |
website | — |
scoringMethod | accuracy |
higherIsBetter | Yes |
introducedDate | 2025-04 |
maintainer | OpenAI |
source | arxiv.org/abs/2504.12345 |
Debug info
Thing ID: 6A4fafVF2n
Source Table: benchmarks
Source ID: 6A4fafVF2n
Wiki ID: browsecomp