Blog / Prompt engineering / How to Control Claude Reasoning Spend

How to Control Claude Reasoning Spend

Learn how to use Claude task budgets and xhigh effort to control reasoning spend in long-running agents without killing quality. See examples inside.

Ilia Ilinskii
Rephrase · May 9, 2026

Prompt engineering7 min read

On this page

Key Takeaways What are Claude task budgets and xhigh effort?Why does reasoning spend get out of control in long-running agents?How should you choose effort levels in Claude Opus 4.7?How do you prompt Claude to respect a reasoning budget?What does a budget-aware agent workflow look like?What should you watch when task budgets are enabled?References

Long-running agents have a bad habit: they burn budget on the wrong parts of the job. The model thinks hard where it shouldn't, then arrives underpowered when the task actually gets difficult.

Key Takeaways

Claude Opus 4.7 adds two practical controls for cost-aware agents: xhigh effort and task budgets.[1]
More reasoning spend can improve long-horizon performance, but research shows the gains are uneven and often plateau by domain.[2]
The best pattern is not "always think harder." It's to spend more on planning, less on routine execution, and cap waste early.
Before → after prompt design matters here. A small instruction about when to escalate effort can save a surprising amount of money.
Tools like Rephrase can help turn rough agent instructions into clearer budget-aware prompts in a couple of seconds.

What are Claude task budgets and xhigh effort?

Claude Opus 4.7 gives developers two new knobs for controlling reasoning spend: xhigh effort for finer thinking control, and task budgets for guiding token allocation over longer agent runs. Anthropic positions them as production levers for managing the tradeoff between reasoning quality, latency, and cost in hard, multi-step workflows.[1]

That matters because agent cost rarely explodes in one giant call. It leaks through dozens or hundreds of slightly-too-expensive calls. If you run coding agents, research agents, or support agents overnight, this is where margins disappear.

The interesting bit is that xhigh sits between high and max. That sounds minor, but it fixes a real operational problem. "High" can be too shallow for planning-heavy steps. "Max" can be overkill. xhigh gives you a middle lane.

Task budgets solve a different issue. They let you guide overall spend across longer runs instead of treating every step like an isolated reasoning event.[1] In practice, that means your agent can be taught not to blow premium thinking on low-value subtasks.

Why does reasoning spend get out of control in long-running agents?

Reasoning spend gets out of control when agents apply expensive deliberation uniformly across a trajectory, even though long-running tasks contain a mix of cheap steps, hard steps, and dead ends. Research on long-horizon agent benchmarks shows that planning quality, not raw tool access, is often the real bottleneck.[2][3]

Here's what I noticed reading the benchmarks: the expensive part is not always "generation." It's the churn. Agents reread, over-plan, re-check, and keep burning tokens while making weak strategic choices.

In EnterpriseOps-Gym, performance drops as task horizon increases, and more thinking budget helps in many domains but not all. Some workflows improve sharply with higher thinking budgets, while others plateau early or even peak at medium effort.[2] That is the key lesson. More reasoning is not a universal fix.

In YC-Bench, long-term success depends heavily on maintaining useful memory and avoiding repeated bad decisions. The winning models don't just "think more." They preserve strategy, blacklist bad patterns, and stay coherent over time.[3]

So if your agent keeps getting expensive, the issue may not be that it needs more compute. It may be that it lacks budget discipline.

How should you choose effort levels in Claude Opus 4.7?

Use lower effort for routine steps, high or xhigh for decision-heavy steps, and reserve the most expensive reasoning only for moments when better planning changes the outcome. Anthropic recommends starting with high or xhigh for coding and agentic use cases, not blindly pushing every call to maximum.[1]

I'd treat effort like a routing policy, not a model preference. Cheap steps stay cheap. Hard steps earn extra thinking. Unknown steps start moderate, then escalate only if they fail.

Here's a simple comparison:

Step type	Recommended effort	Why
Retrieval, classification, extraction	low / medium	Little value from extended reasoning
Simple tool call formatting	low	Mostly deterministic
Multi-step planning	high / xhigh	Better reasoning changes downstream success
Debugging ambiguous failures	xhigh	Worth spending for diagnosis
Final verification on critical tasks	xhigh / max	Expensive, but sometimes justified

This routing approach matches what the research suggests. In EnterpriseOps-Gym, giving weaker executors better plans produced bigger gains than simply throwing more complexity at the agent architecture.[2]

How do you prompt Claude to respect a reasoning budget?

The best way to control reasoning spend is to make budget policy explicit in the prompt: tell the agent when to stay cheap, when to escalate effort, and what conditions justify deeper thinking. If you don't specify this, the agent has to invent its own spend policy.

Here's a weak prompt:

Solve this task thoroughly and use deep reasoning when needed.

That sounds fine. It's also vague enough to invite overthinking.

Here's a better version:

You are a cost-aware long-running agent.

Budget policy:
- Use minimal reasoning for retrieval, extraction, formatting, and straightforward tool calls.
- Use high effort for planning, ambiguity resolution, and dependency ordering.
- Escalate to xhigh only when a mistake would create expensive downstream rework or when prior attempts failed.
- Before escalating, briefly state why the higher effort is justified.
- Prefer concise intermediate reasoning and avoid re-evaluating settled decisions unless new evidence appears.

Execution policy:
1. Classify the current step as routine, planning-heavy, or failure-recovery.
2. Choose the cheapest effort level that can reliably complete it.
3. Preserve key decisions in a compact running memory.
4. If blocked, propose one high-value next action instead of exploring multiple branches.

That prompt does two things. First, it narrows the spend policy. Second, it prevents the classic agent move of reopening settled questions for no reason.

If you want more examples like this, the Rephrase blog has more articles on practical prompting patterns for real workflows.

What does a budget-aware agent workflow look like?

A budget-aware agent workflow separates planning from execution, spends more on the plan, and keeps a compact memory so the model does not repeatedly pay to rediscover the same facts. This approach lines up with benchmark findings on planning bottlenecks and memory-driven success.[2][3]

I like a three-phase pattern:

Plan once with high or xhigh effort. Build the task map, dependencies, risks, and stop conditions.
Execute mostly at low or medium effort. Use tools, retrieve facts, transform outputs, and move forward cheaply.
Escalate selectively. If a branch fails, a dependency is unclear, or a final answer is high-stakes, increase effort for that step only.

The "before → after" shift looks like this:

Before	After
Every agent turn uses the same high-effort policy	Planning and recovery use high/xhigh; execution uses low/medium
No memory discipline	Agent stores compact decisions and constraints
Re-checks settled issues repeatedly	Reopens decisions only with new evidence
Spends heavily at the start and drifts	Preserves budget for hard branches later

This is also where prompt rewriting tools are genuinely useful. If you write "be thorough but efficient," that's not enough. A tool like Rephrase can turn that fuzzy instruction into a specific budget policy with escalation rules.

What should you watch when task budgets are enabled?

When task budgets are enabled, watch for two failure modes: under-spending on critical planning and over-spending on repeated low-value loops. The goal is not just to reduce tokens. It's to spend them where they produce better outcomes per dollar.

I'd monitor four things in production: total tokens per completed task, tokens spent before first useful action, number of retries per branch, and success rate after escalation. Those tell you whether the budget policy is helping or just suppressing thought.

One more thing: benchmarks on long-running agents keep showing that memory quality matters. In YC-Bench, scratchpad usage was one of the strongest predictors of success because it prevented the model from relearning the same painful lesson over and over.[3] That means a budget policy without memory is incomplete.

Try this the next time your Claude agent gets expensive: don't just lower the budget. Rewrite the spend policy. Tell the agent exactly what deserves xhigh effort and what absolutely does not. That's the difference between "cheaper" and "actually efficient."

References

Documentation & Research

Anthropic Releases Claude Opus 4.7: A Major Upgrade for Agentic Coding, High-Resolution Vision, and Long-Horizon Autonomous Tasks - MarkTechPost (link)
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings - arXiv (link)
YC-Bench: Benchmarking AI Agents for Long-Term Planning and Consistent Execution - arXiv / The Prompt Report (link)
SWE-AGI: Benchmarking Specification-Driven Software Construction with MoonBit in the Era of Autonomous Agents - arXiv (link)

Community Examples 5. End to end Task Orchestration with Claude AI (Free Plan) - r/PromptEngineering (link)

Frequently asked

What is xhigh effort in Claude Opus 4.7?

xhigh is a reasoning effort level between high and max. It gives you a finer tradeoff between quality, latency, and cost on hard tasks, especially in coding and agent workflows.

Should I always use xhigh effort for agents?

No. xhigh is best for difficult steps where better planning changes the outcome. For routine retrieval, formatting, or simple tool calls, lower effort is usually cheaper and fast enough.