Term

Long-horizon task

A task spanning many steps over hours or days, requiring durable state and memory.

Long-horizon is where most agent quality issues silently compound. Memory architecture and replay-based evaluation matter disproportionately here.