MIT CSAIL

Decoupled planner-critic agents outperform monolithic planners on long tasks

Splitting planning and critique into specialized models with structured exchange yields a 14-point lift on multi-day research tasks.

A decoupled architecture — a smaller planner generates a tree of candidate steps, a larger critic prunes — outperforms monolithic planners by 14 points on a multi-day research benchmark while reducing total token cost by 28%.

What changed. Empirical validation that role specialization (planner vs. critic) beats a single high-capacity model running both jobs.

Why it matters. This is a cost-quality Pareto improvement. Most teams default to “biggest model everywhere” and leave value on the table.

Builder takeaway. Try a small planner + frontier critic on your hardest workloads. Expect to spend a week tuning the exchange protocol before seeing the gain.

The Agent Brief

Three things in agentic AI, every Tuesday.

What changed, what matters, what builders should do next. No hype. No paid placement.