BrowseComp

Agentic

A benchmark evaluating AI systems' ability to find hard-to-locate information on the web, testing browsing, search, and information synthesis capabilities across difficult queries.

Wiki page →Data →

Models Tested

Scoring: accuracy

Introduced: 2025-04

Maintainer: OpenAI

No model scores recorded for this benchmark yet.