OSWorld
Computer-use benchmark spanning OS, browser, and productivity apps.
OSWorld evaluates agents on mixed desktop tasks — a closer proxy to office-worker workloads than browser-only benchmarks.
Computer-use benchmark spanning OS, browser, and productivity apps.
OSWorld evaluates agents on mixed desktop tasks — a closer proxy to office-worker workloads than browser-only benchmarks.