Cost controls for agent workloads
Token budgets, fallback tiers, and the dashboards that catch runaway runs before they hurt.
Prereqs
- Agent in production
- Observability stack
Agent workloads can swing from $0.02 to $20 per task on the same prompt. Build the controls now, not after the surprise invoice.
Step 1: Per-agent budgets
Hard-cap tokens per agent and per session. Terminate cleanly.
Step 2: Fallback tiers
Every hot route should have a small-model fallback. Most users cannot tell the difference for routine queries.
Step 3: Anomaly alerts
Alert on p95 cost-per-task spikes, not just totals. Spikes catch loops before totals catch them.