benchmark
OSWorld
Metadata
| Source Table | benchmarks |
| Source ID | Hpb8OjdhT9 |
| Description | A benchmark for multimodal agents on real-world computer tasks across operating systems, testing GUI interaction and task completion. |
| Source URL | os-world.github.io/ |
| Wiki ID | osworld |
| Children | — |
| Created | Mar 14, 2026, 12:43 AM |
| Updated | Mar 24, 2026, 11:24 PM |
| Synced | Mar 24, 2026, 11:24 PM |
Record Data
id | Hpb8OjdhT9 |
slug | osworld |
name | OSWorld |
category | agentic |
description | A benchmark for multimodal agents on real-world computer tasks across operating systems, testing GUI interaction and task completion. |
website | os-world.github.io/ |
scoringMethod | percentage |
higherIsBetter | Yes |
introducedDate | 2024-04 |
maintainer | CMU / HKU |
source | arxiv.org/abs/2404.07972 |
Debug info
Thing ID: Hpb8OjdhT9
Source Table: benchmarks
Source ID: Hpb8OjdhT9
Wiki ID: osworld