Benchmark

GAIA

General AI assistants benchmark.

Measures
Multi-step reasoning with tools and retrieval
Current leader
GPT-5

GAIA evaluates assistants across realistic information-seeking tasks. Strong proxy for research-agent quality.