Blog / Prompt tips / How to Control Claude Agent Reasoning Sp…

How to Control Claude Agent Reasoning Spend

Learn how to use Claude task budgets and xhigh effort to control reasoning spend in long-running agents without crushing quality. Try free.

Ilia Ilinskii
Rephrase · May 21, 2026

Prompt tips8 min read

On this page

Key Takeaways What are Claude task budgets and xhigh effort?Why do long-running agents overspend on reasoning?How should you use xhigh effort in Claude agents?How do you prompt Claude to respect a reasoning budget?What does a good reasoning-spend workflow look like?References

Long-running agents fail in a boring way: not by being dumb, but by being expensive. They keep thinking, retrying, and verifying long after the answer stopped getting better.

Key Takeaways

Claude Opus 4.7 adds two practical controls for agent cost: xhigh effort and task budgets. [1]
Static effort settings are usually wasteful because not every step in an agent run needs deep reasoning. [2]
Prompting an agent to "be budget-conscious" is weak on its own; research shows explicit budget controls work better. [2]
The best pattern is simple: spend more on planning and verification, less on routine execution and lookup.
If you want fast prompt cleanup before sending instructions into Claude, tools like Rephrase can tighten that step automatically.

What are Claude task budgets and xhigh effort?

Claude Opus 4.7 gives developers two separate levers for controlling reasoning spend: a new xhigh effort level between high and max, and task budgets in the Claude Platform API for guiding token spend across long-running tasks. Together, they let you tune how hard the model thinks and how much total budget a task should consume. [1]

That split matters. I think a lot of teams mash these into one mental model, but they solve different problems.

xhigh effort is a per-call quality-versus-cost setting. Anthropic positions it as the right starting point for coding and agentic work when high feels too light and max feels too expensive or slow. In Claude Code, Anthropic even raised the default to xhigh for plans, which is a strong hint about where they think the sweet spot is for serious work. [1]

Task budgets, on the other hand, are a run-level control. They're meant to guide spend over a longer execution, not just a single response. That's the more important idea for agents. A long-running agent doesn't fail because one step was costly. It fails because dozens of "reasonable" steps add up.

Why do long-running agents overspend on reasoning?

Long-running agents overspend because they make many local decisions that look sensible in isolation but are economically bad in sequence. Research shows that agents often retry unreliable actions, overspend early, or become too conservative after seeing budget pressure, which creates a gap between theoretical budget efficiency and what actually happens in practice. [2][3]

That matches what I keep seeing in production systems. The problem is not just "too much thinking." It's poor allocation of thinking.

A recent paper on budget-constrained agents found that prompt-only budget awareness is unreliable. Even when agents are explicitly told about tool costs and current spending, they still violate budgets or get trapped in repetitive exploration. The stronger result came from inference-time planning that actually enforced or calibrated decisions against remaining budget. [2]

Another paper, ZEBRA, makes the same point from a higher level: the question is no longer just whether the system stays under budget, but how it spends that budget across phases of a workflow. Their results showed that smarter allocation across phases recovered much more quality than asking the model to allocate budget directly. [3]

Here's the catch: more budget does not automatically fix behavior. ZebraArena found that agents can suffer from what the authors call budget anxiety. With tight budgets they get brittle, but with looser budgets they still don't reliably convert extra slack into better outcomes. [4]

How should you use xhigh effort in Claude agents?

You should use xhigh effort for the parts of an agent run where mistakes compound: planning, ambiguous decisions, debugging, and final verification. For routine actions like extraction, formatting, or obvious tool calls, lower effort is usually the better trade because it preserves speed and reduces waste. [1][5]

This is where most teams go wrong. They pick one effort level and spray it across the whole loop.

Research on adaptive reasoning effort selection backs up the intuition. ARES showed that static "always high" or "always low" strategies are weak because different steps need different cognitive spend. Their router cut reasoning token usage by up to 52.7% while keeping task performance close to fixed high-effort baselines. [5]

One especially useful finding: low effort at every step can wreck performance, but high effort at every step is also a blunt instrument. So if you're deciding between "cheap everywhere" and "smart everywhere," the right answer is neither.

A practical way to think about effort levels in Claude:

Agent step type	Recommended effort	Why
Initial planning	xhigh	Sets direction and reduces downstream retries
Tool or file selection under ambiguity	xhigh	Early errors cascade
Routine retrieval or extraction	medium/high	Usually low-complexity work
Reformatting, summarizing, cleanup	low/medium	Little value from deep reasoning
Final answer verification	xhigh	Cheap compared to shipping a wrong result

How do you prompt Claude to respect a reasoning budget?

You should prompt Claude with explicit phase priorities and failure rules, not vague instructions to "be efficient." Good budget-aware prompts tell the model where to spend effort, when to stop exploring, and what tradeoffs to make if the budget tightens. That works better because it turns "cost awareness" into an operating policy instead of a slogan. [2][3]

Here's a weak prompt:

Solve this carefully, but keep costs low.

Here's a stronger version:

You are running under a limited reasoning budget.

Prioritize spend in this order:
1. Build a short plan before acting.
2. Spend more reasoning on ambiguous decisions, root-cause analysis, and final verification.
3. Use cheaper, shorter reasoning for routine retrieval, formatting, and obvious transformations.
4. Do not retry the same failing approach more than once unless new evidence appears.
5. If the budget appears tight, reduce depth on non-critical subtasks before reducing final verification quality.

Return:
- short plan
- execution
- final verification notes

That prompt won't magically enforce budget by itself. The research is pretty clear on that. But it does make task budgets and effort controls more useful because the model now has an internal policy for how to allocate effort. [2]

If you write these instructions often, this is also a nice place to use Rephrase's prompt tools and blog examples to tighten wording before it hits the model.

What does a good reasoning-spend workflow look like?

A good reasoning-spend workflow separates global budget from local effort decisions. You set a task-level ceiling, spend heavily on high-leverage phases, and watch for wasted loops rather than only watching token totals. That gives agents room to think deeply where it matters without turning every step into an expensive research project. [1][3][5]

Here's the workflow I'd actually recommend:

Start with high or xhigh as Anthropic suggests for coding and agentic tasks. Use xhigh when the task is long-horizon or failure is costly. [1]
Wrap the run in a task budget so the agent has a ceiling for the full job, not just one response. [1]
Define phase-based rules in the prompt: plan deeply, execute lightly, verify deeply.
Track where spend clusters. If cost is piling up in retries or exploration, your issue is usually orchestration, not model intelligence.
Downgrade routine steps before touching planning or final verification. That's the highest-return compromise.

This before-and-after is the core shift:

Before	After
One effort level for the whole loop	Different effort by phase
"Please be efficient"	Explicit budget policy
Watch total tokens at the end	Watch retries and spend concentration during the run
Assume more budget means better results	Treat budget allocation as a design problem

The big idea is simple: reasoning spend is a product decision, not just a model setting. Claude Opus 4.7 gives you better controls with xhigh effort and task budgets, but the win comes from how you combine them.

If you're building long-running agents, don't just ask Claude to think harder. Tell it where hard thinking is worth paying for.

References

Documentation & Research

Anthropic Releases Claude Opus 4.7: A Major Upgrade for Agentic Coding, High-Resolution Vision, and Long-Horizon Autonomous Tasks - MarkTechPost (link)
Budget-Constrained Agentic Large Language Models: Intention-Based Planning for Costly Tool Use - arXiv (link)
ZEBRA: Zero-shot Budgeted Resource Allocation for LLM Orchestration - arXiv (link)
ZEBRAARENA: A Diagnostic Simulation Environment for Studying Reasoning-Action Coupling in Tool-Augmented LLMs - arXiv (link)
Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents - arXiv (link)

Community Examples

No supplementary community sources used.

Frequently asked

What is Claude xhigh effort?

xhigh is an effort level between high and max in Claude Opus 4.7. It gives you finer control over the tradeoff between reasoning depth, latency, and cost on harder tasks.

Should I always use xhigh for long-running agents?

No. xhigh is best for steps where the agent is planning, debugging, or verifying critical work. For routine steps, lower effort often preserves quality while cutting cost and latency.