Decoupled planner-critic agents outperform monolithic planners on long tasks

Splitting planning and critique into specialized models with structured exchange yields a 14-point lift on multi-day research tasks.

Apr 4, 2026 I. Tanaka, M. Eaton View paper →

A decoupled architecture — a smaller planner generates a tree of candidate steps, a larger critic prunes — outperforms monolithic planners by 14 points on a multi-day research benchmark while reducing total token cost by 28%.

What changed. Empirical validation that role specialization (planner vs. critic) beats a single high-capacity model running both jobs.

Why it matters. This is a cost-quality Pareto improvement. Most teams default to “biggest model everywhere” and leave value on the table.

Builder takeaway. Try a small planner + frontier critic on your hardest workloads. Expect to spend a week tuning the exchange protocol before seeing the gain.

Decoupled planner-critic agents outperform monolithic planners on long tasks

Three things in agentic AI, every Tuesday.