ToolCUA: Enhancing Tool-Use Reliability in Open-Source Agents

New SOTA for comparable-scale models on OSWorld-MCP via improved tool comprehension and usage accuracy.

May 14, 2026 ToolCUA Authors View paper →

ToolCUA targets the Achilles heel of agents: actually using tools right. By drilling comprehension-usage-action pipelines, it lifts open models 66% on OSWorld-MCP desktop tasks to new SOTA 46.85%. Exposes how generic finetuning leaves tool reliability on the table.

What changed. 66% relative lift to 46.85% on OSWorld-MCP via tool-specific training.

Why it matters. Tool failures kill 50%+ of agent runs— this fixes the basics.

Builder takeaway. Don’t just RLHF agents; decompose tool skills into CUA modules. Paper

ToolCUA: Enhancing Tool-Use Reliability in Open-Source Agents

Three things in agentic AI, every Tuesday.