OSWorld
AgenticA benchmark for multimodal agents on real-world computer tasks across operating systems, testing GUI interaction and task completion.
Models Tested
0
Scoring: percentage
Introduced: 2024-04
No model scores recorded for this benchmark yet.