Blog / Prompt engineering / Why Agents Need Reasoning Reuse

Why Agents Need Reasoning Reuse

Learn why reasoning reuse beats bigger context for multi-turn agents, and how preserve_thinking reduces drift, cost, and repeated mistakes. Read on.

Ilia Ilinskii
Rephrase · May 23, 2026

Prompt engineering7 min read

On this page

Key Takeaways Why isn't bigger context enough for multi-turn agents?What does preserve_thinking actually mean?Why does reasoning reuse beat transcript replay?How should multi-turn agents preserve reasoning state?What does reasoning reuse look like in prompts?Why is this more important than bigger context for agent builders?References

Most multi-turn agents do not fail because they lack a giant context window. They fail because they keep forgetting what already mattered.

Key Takeaways

Bigger context helps storage, but not necessarily attention, relevance, or consistency.
Research on long-horizon agents shows that active context management beats raw transcript accumulation.[1][2]
Reasoning reuse matters because agents need to preserve conclusions, not just tokens.
Decoupling working context from detailed history reduces drift, confusion, and wasted steps.[1][3]
If you build or prompt agents, design for reusable reasoning state before you buy more context.

Why isn't bigger context enough for multi-turn agents?

Bigger context is not enough because multi-turn agents do not just need more room. They need a stable, task-aligned working state that can survive long interactions without drowning in stale details, misplaced assumptions, or irrelevant tokens.[1][2]

Here's the core mistake I keep seeing: people assume a 200k or 1M token window automatically solves agent memory. It doesn't. A bigger window gives you storage capacity. It does not guarantee retrieval quality, attention quality, or clean execution. The SoK on Agentic RAG says this directly: long-context models still need structured context selection because performance degrades depending on where relevant information appears in long inputs.[2]

That point matters more than it sounds. In a real agent loop, every turn adds more observations, tool outputs, plans, guesses, and partial conclusions. If you just keep appending that stream, the agent eventually reasons over clutter. ARC calls this context rot: the agent's internal state becomes less coherent and less aligned as the task stretches on.[1]

So the real bottleneck is not token capacity. It's state quality.

What does preserve_thinking actually mean?

Preserve_thinking means carrying forward the useful reasoning state from earlier turns in a compact, reusable form instead of forcing the model to reconstruct it from raw history every time.

I'd define it as preserving the conclusions, priorities, and decision context that still matter now. Not every chain-of-thought token. Not every dead-end search. Not every discarded hypothesis. Just the parts that should keep shaping the next move.

That distinction lines up with recent long-horizon agent work. ARC separates action execution from context management and maintains an interaction memory plus a checklist that can be revised over time.[1] UI-Copilot goes even more concrete: it keeps only concise progress summaries in the live dialogue while storing detailed reasoning externally for retrieval on demand.[3]

That is basically the preserve_thinking idea in architecture form. The agent does not need to reread every old thought. It needs access to the right distilled thought.

This is also where reasoning reuse becomes more interesting than plain memory. Memory says, "store what happened." Reasoning reuse says, "store what was learned and make it usable again."

Why does reasoning reuse beat transcript replay?

Reasoning reuse beats transcript replay because replay preserves volume, while reuse preserves signal. Multi-turn agents need the second one far more.

Transcript replay looks safe. In theory, if you keep the full trace, nothing is lost. In practice, everything important gets buried. ARC shows that raw accumulation leads to attention dilution, while passive summarization alone still lets early mistakes persist.[1] UI-Copilot reports similar failure modes in GUI agents: memory degradation, progress confusion, and math hallucination when too much reasoning is mixed into the active context.[3]

What works better is a split model:

Approach	What it keeps live	Main failure mode	Better use case
Raw transcript replay	Everything	Attention dilution, drift	Short tasks
Bigger context only	More of everything	Relevance collapse	Broad document intake
Summary plus retrieval	Progress + on-demand details	Summary quality risk	Long multi-turn tasks
Reasoning reuse	Distilled conclusions and strategies	Requires memory design	Persistent agents

This is why I think preserve_thinking is underrated. It's not about preserving every thought. It's about preserving the right cognitive residue.

A good agent should be able to say: "We already established X. Y was a dead end. Z is still unresolved. Continue from there."

That is much closer to how competent humans work too.

How should multi-turn agents preserve reasoning state?

Multi-turn agents should preserve reasoning state through compact summaries, explicit task checklists, and retrieval of prior insights, with the ability to revise those artifacts when later evidence proves them wrong.[1][2][3]

Notice the last part: revise. This is the catch.

If you only compress history, you may compress mistakes too. ARC's main contribution is showing that context management should be active and reflection-driven, not just passive summarization.[1] The system updates memory every turn, checks for degradation, and can reorganize the working context when it detects misalignment. That's a lot closer to preserve_thinking than "stuff old messages into a longer prompt."

UI-Copilot reaches a similar result from another angle. It uses a multi-turn summary for active progress tracking while detailed observations are stored separately and retrieved only when needed.[3] That reduces overload and keeps the execution context lighter.

If you're designing prompts or agent scaffolds, I'd turn that into a simple operating rule:

Keep a short live state: goal, completed steps, open questions, current hypothesis.
Store detailed traces outside the live loop.
Retrieve only the traces relevant to the current subproblem.
Periodically revise the live state instead of only appending to it.

Tools like Rephrase can help you phrase these instructions clearly when you're building prompts for agent frameworks, especially if you want the model to maintain a structured running state instead of dumping verbose thoughts every turn. And if you want more prompting workflows like this, the Rephrase blog is a good rabbit hole.

What does reasoning reuse look like in prompts?

Reasoning reuse in prompts looks like telling the model to maintain and update reusable internal artifacts, not just continue a chat transcript.

Here's a simple before-and-after.

Before	After
"Continue the task from the previous messages."	"Before acting, update a running state with: current goal, confirmed facts, failed attempts, open questions, and next best action. Reuse prior confirmed conclusions unless contradicted by new evidence."

And here's a stronger pattern for agent prompts:

Maintain a compact working memory across turns.

At each turn:
1. Update confirmed facts.
2. Mark invalidated assumptions.
3. Keep a short checklist of remaining subgoals.
4. Reuse prior conclusions instead of re-deriving them.
5. Retrieve detailed prior reasoning only if needed to resolve the current step.

Do not copy the full transcript into the active reasoning state.
Prefer concise, revisable summaries over raw history.

That instruction does two things. First, it reduces pointless recomputation. Second, it reduces anchor drift, where the model keeps reinterpreting the problem from scratch.

I've noticed this is especially useful for research agents, coding agents, and ops assistants that touch multiple tools over many turns. Bigger context makes them able to carry more. Reasoning reuse makes them able to stay coherent.

Why is this more important than bigger context for agent builders?

This is more important because context length is a capacity upgrade, while reasoning reuse is an architecture upgrade. One gives you more room. The other changes how the room is organized.

The research trend is pretty clear. ARC improves long-horizon performance by actively managing context, not by simply expanding it.[1] The Agentic RAG survey frames memory, pruning, and retrieval as core design choices even in long-context settings.[2] UI-Copilot shows that decoupling progress tracking from detailed reasoning reduces confusion in long tasks.[3]

So my take is simple: if your agent fails after 20 turns, don't assume the fix is 10x more context. The fix is often better preservation of the reasoning state it already produced.

That's the feature preserve_thinking points toward. And yes, it matters more than bigger context for any agent expected to work across sessions, tools, and evolving subtasks.

If you're prompting these systems manually, start there. If you're doing it all day, automate the cleanup and rewriting step with something like Rephrase so your prompts consistently ask for reusable state instead of bloated history.

References

Documentation & Research

ARC: Active and Reflection-driven Context Management for Long-Horizon Information Seeking Agents - arXiv cs.AI (link)
SoK: Agentic Retrieval-Augmented Generation (RAG): Taxonomy, Architectures, Evaluation, and Research Directions - arXiv cs.AI (link)
UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization - arXiv cs.LG (link)

Community Examples 4. Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures - MarkTechPost (link)

Frequently asked

What is reasoning reuse in AI agents?

Reasoning reuse means an agent can preserve useful intermediate conclusions, plans, and lessons from prior turns instead of regenerating them from scratch. Done well, it improves consistency, lowers cost, and reduces repeated mistakes.

What does preserve_thinking mean in practice?

In practice, preserve_thinking means keeping a compact, task-relevant representation of the agent's prior reasoning state available across turns. That can include summaries, checklists, retrieved memories, or reusable reasoning strategies rather than full raw transcripts.