Learn how to use Claude task budgets and xhigh effort to control reasoning spend in long-running agents without crushing quality. Try free.
Long-running agents fail in a boring way: not by being dumb, but by being expensive. They keep thinking, retrying, and verifying long after the answer stopped getting better.
Claude Opus 4.7 gives developers two separate levers for controlling reasoning spend: a new xhigh effort level between high and max, and task budgets in the Claude Platform API for guiding token spend across long-running tasks. Together, they let you tune how hard the model thinks and how much total budget a task should consume. [1]
That split matters. I think a lot of teams mash these into one mental model, but they solve different problems.
xhigh effort is a per-call quality-versus-cost setting. Anthropic positions it as the right starting point for coding and agentic work when high feels too light and max feels too expensive or slow. In Claude Code, Anthropic even raised the default to xhigh for plans, which is a strong hint about where they think the sweet spot is for serious work. [1]
Task budgets, on the other hand, are a run-level control. They're meant to guide spend over a longer execution, not just a single response. That's the more important idea for agents. A long-running agent doesn't fail because one step was costly. It fails because dozens of "reasonable" steps add up.
Long-running agents overspend because they make many local decisions that look sensible in isolation but are economically bad in sequence. Research shows that agents often retry unreliable actions, overspend early, or become too conservative after seeing budget pressure, which creates a gap between theoretical budget efficiency and what actually happens in practice. [2][3]
That matches what I keep seeing in production systems. The problem is not just "too much thinking." It's poor allocation of thinking.
A recent paper on budget-constrained agents found that prompt-only budget awareness is unreliable. Even when agents are explicitly told about tool costs and current spending, they still violate budgets or get trapped in repetitive exploration. The stronger result came from inference-time planning that actually enforced or calibrated decisions against remaining budget. [2]
Another paper, ZEBRA, makes the same point from a higher level: the question is no longer just whether the system stays under budget, but how it spends that budget across phases of a workflow. Their results showed that smarter allocation across phases recovered much more quality than asking the model to allocate budget directly. [3]
Here's the catch: more budget does not automatically fix behavior. ZebraArena found that agents can suffer from what the authors call budget anxiety. With tight budgets they get brittle, but with looser budgets they still don't reliably convert extra slack into better outcomes. [4]
You should use xhigh effort for the parts of an agent run where mistakes compound: planning, ambiguous decisions, debugging, and final verification. For routine actions like extraction, formatting, or obvious tool calls, lower effort is usually the better trade because it preserves speed and reduces waste. [1][5]
This is where most teams go wrong. They pick one effort level and spray it across the whole loop.
Research on adaptive reasoning effort selection backs up the intuition. ARES showed that static "always high" or "always low" strategies are weak because different steps need different cognitive spend. Their router cut reasoning token usage by up to 52.7% while keeping task performance close to fixed high-effort baselines. [5]
One especially useful finding: low effort at every step can wreck performance, but high effort at every step is also a blunt instrument. So if you're deciding between "cheap everywhere" and "smart everywhere," the right answer is neither.
A practical way to think about effort levels in Claude:
| Agent step type | Recommended effort | Why |
|---|---|---|
| Initial planning | xhigh | Sets direction and reduces downstream retries |
| Tool or file selection under ambiguity | xhigh | Early errors cascade |
| Routine retrieval or extraction | medium/high | Usually low-complexity work |
| Reformatting, summarizing, cleanup | low/medium | Little value from deep reasoning |
| Final answer verification | xhigh | Cheap compared to shipping a wrong result |
You should prompt Claude with explicit phase priorities and failure rules, not vague instructions to "be efficient." Good budget-aware prompts tell the model where to spend effort, when to stop exploring, and what tradeoffs to make if the budget tightens. That works better because it turns "cost awareness" into an operating policy instead of a slogan. [2][3]
Here's a weak prompt:
Solve this carefully, but keep costs low.
Here's a stronger version:
You are running under a limited reasoning budget.
Prioritize spend in this order:
1. Build a short plan before acting.
2. Spend more reasoning on ambiguous decisions, root-cause analysis, and final verification.
3. Use cheaper, shorter reasoning for routine retrieval, formatting, and obvious transformations.
4. Do not retry the same failing approach more than once unless new evidence appears.
5. If the budget appears tight, reduce depth on non-critical subtasks before reducing final verification quality.
Return:
- short plan
- execution
- final verification notes
That prompt won't magically enforce budget by itself. The research is pretty clear on that. But it does make task budgets and effort controls more useful because the model now has an internal policy for how to allocate effort. [2]
If you write these instructions often, this is also a nice place to use Rephrase's prompt tools and blog examples to tighten wording before it hits the model.
A good reasoning-spend workflow separates global budget from local effort decisions. You set a task-level ceiling, spend heavily on high-leverage phases, and watch for wasted loops rather than only watching token totals. That gives agents room to think deeply where it matters without turning every step into an expensive research project. [1][3][5]
Here's the workflow I'd actually recommend:
Start with high or xhigh as Anthropic suggests for coding and agentic tasks. Use xhigh when the task is long-horizon or failure is costly. [1]
Wrap the run in a task budget so the agent has a ceiling for the full job, not just one response. [1]
Define phase-based rules in the prompt: plan deeply, execute lightly, verify deeply.
Track where spend clusters. If cost is piling up in retries or exploration, your issue is usually orchestration, not model intelligence.
Downgrade routine steps before touching planning or final verification. That's the highest-return compromise.
This before-and-after is the core shift:
| Before | After |
|---|---|
| One effort level for the whole loop | Different effort by phase |
| "Please be efficient" | Explicit budget policy |
| Watch total tokens at the end | Watch retries and spend concentration during the run |
| Assume more budget means better results | Treat budget allocation as a design problem |
The big idea is simple: reasoning spend is a product decision, not just a model setting. Claude Opus 4.7 gives you better controls with xhigh effort and task budgets, but the win comes from how you combine them.
If you're building long-running agents, don't just ask Claude to think harder. Tell it where hard thinking is worth paying for.
Documentation & Research
Community Examples
xhigh is an effort level between high and max in Claude Opus 4.7. It gives you finer control over the tradeoff between reasoning depth, latency, and cost on harder tasks.
No. xhigh is best for steps where the agent is planning, debugging, or verifying critical work. For routine steps, lower effort often preserves quality while cutting cost and latency.