Independent

ToolCUA: Enhancing Tool-Use Reliability in Open-Source Agents

New SOTA for comparable-scale models on OSWorld-MCP via improved tool comprehension and usage accuracy.

ToolCUA targets the Achilles heel of agents: actually using tools right. By drilling comprehension-usage-action pipelines, it lifts open models 66% on OSWorld-MCP desktop tasks to new SOTA 46.85%. Exposes how generic finetuning leaves tool reliability on the table.

What changed. 66% relative lift to 46.85% on OSWorld-MCP via tool-specific training.

Why it matters. Tool failures kill 50%+ of agent runs— this fixes the basics.

Builder takeaway. Don’t just RLHF agents; decompose tool skills into CUA modules. Paper

The Agent Brief

Three things in agentic AI, every Tuesday.

What changed, what matters, what builders should do next. No hype. No paid placement.