Benchmark

OSWorld

Computer-use benchmark spanning OS, browser, and productivity apps.

Measures
Cross-application desktop tasks
Current leader
Claude (Computer Use)

OSWorld evaluates agents on mixed desktop tasks — a closer proxy to office-worker workloads than browser-only benchmarks.