Skip to content
Longterm Wiki
benchmark

OSWorld

Metadata

Source Tablebenchmarks
Source IDHpb8OjdhT9
DescriptionA benchmark for multimodal agents on real-world computer tasks across operating systems, testing GUI interaction and task completion.
Source URLos-world.github.io/
Wiki IDosworld
Children
CreatedMar 14, 2026, 12:43 AM
UpdatedMar 24, 2026, 11:24 PM
SyncedMar 24, 2026, 11:24 PM

Record Data

idHpb8OjdhT9
slugosworld
nameOSWorld
categoryagentic
descriptionA benchmark for multimodal agents on real-world computer tasks across operating systems, testing GUI interaction and task completion.
websiteos-world.github.io/
scoringMethodpercentage
higherIsBetterYes
introducedDate2024-04
maintainerCMU / HKU
sourcearxiv.org/abs/2404.07972
Debug info

Thing ID: Hpb8OjdhT9

Source Table: benchmarks

Source ID: Hpb8OjdhT9

Wiki ID: osworld