Six failure modes in tool-using agents, and the patterns that fix them

An empirical taxonomy of agent tool-use failures across 4,000 traces from production deployments. Schema drift and silent partial-failure dominate.

Apr 8, 2026 R. Okafor, S. Kim View paper →

A taxonomy of agent tool-use failures derived from 4,000 anonymized production traces. Two modes account for 63% of incidents: schema drift (tool definitions silently change between deploys) and silent partial-failure (tool returns success with degraded data).

What changed. A clean failure taxonomy with empirical frequencies, instead of anecdotes.

Why it matters. Most agent post-mortems blame the model. The data says most agent incidents are caused by tools, not the planner.

Builder takeaway. Wrap every external tool with a contract test that runs in CI. Add a result validator that asserts shape and freshness, not just status code.

Six failure modes in tool-using agents, and the patterns that fix them

Three things in agentic AI, every Tuesday.