Benchmark

GAIA

General AI assistants benchmark.

Measures

Multi-step reasoning with tools and retrieval

Current leader

GPT-5

GAIA evaluates assistants across realistic information-seeking tasks. Strong proxy for research-agent quality.