DeepMind

Six failure modes in tool-using agents, and the patterns that fix them

An empirical taxonomy of agent tool-use failures across 4,000 traces from production deployments. Schema drift and silent partial-failure dominate.

A taxonomy of agent tool-use failures derived from 4,000 anonymized production traces. Two modes account for 63% of incidents: schema drift (tool definitions silently change between deploys) and silent partial-failure (tool returns success with degraded data).

What changed. A clean failure taxonomy with empirical frequencies, instead of anecdotes.

Why it matters. Most agent post-mortems blame the model. The data says most agent incidents are caused by tools, not the planner.

Builder takeaway. Wrap every external tool with a contract test that runs in CI. Add a result validator that asserts shape and freshness, not just status code.

The Agent Brief

Three things in agentic AI, every Tuesday.

What changed, what matters, what builders should do next. No hype. No paid placement.