Rephrase LogoRephrase Logo
FeaturesHow it WorksPricingGalleryDocsBlog
Rephrase LogoRephrase Logo

Better prompts. One click. In any app. Save 30-60 minutes a day on prompt iterations.

Rephrase on Product HuntRephrase on Product Hunt

Product

  • Features
  • Pricing
  • Download for macOS

Use Cases

  • AI Creators
  • Researchers
  • Developers
  • Image to Prompt

Resources

  • Documentation
  • About

Legal

  • Privacy
  • Terms
  • Refund Policy

Ask AI about Rephrase

ChatGPTClaudePerplexity

© 2026 Rephrase-it. All rights reserved.

Available for macOS 13.0+

All product names, logos, and trademarks are property of their respective owners. Rephrase is not affiliated with or endorsed by any of the companies mentioned.

Back to blog
Prompt Tips•Feb 18, 2026•9 min

Tree of Thought Prompting: A Step-by-Step Guide (with real prompts you can copy)

A practical, developer-friendly walkthrough of Tree-of-Thought prompting: how to branch, score, backtrack, and ship better reasoning.

Tree of Thought Prompting: A Step-by-Step Guide (with real prompts you can copy)

Tree-of-Thought prompting is what you reach for when "think step by step" stops being enough.

The catch is that most people treat it like a vibe. "Generate a few solutions and pick the best." That's not Tree-of-Thoughts. That's just sampling.

The original idea is closer to classical search: you generate multiple candidate "thoughts" at each step, evaluate them, keep the best branches, and backtrack when you hit a dead end. You're explicitly trading tokens for exploration-because some problems need lookahead, not just longer narration.

The good news: you can do it today with nothing more than careful prompting and a tiny bit of orchestration logic. The even better news: you can steal structure from recent research that shows why discrete steps and backtracking help models find and fix mistakes, not just explain them after the fact [2].


What "Tree of Thoughts" actually means (and what it doesn't)

Tree-of-Thoughts (ToT) generalizes Chain-of-Thought from a single linear trace into a branching search process. Instead of committing to one reasoning path, you explore several, score them, and expand the most promising ones-like BFS/DFS/beam search but with natural-language "thoughts" as nodes [1]. Surveys still describe it in exactly those terms: multi-path exploration, selection, and deliberate reasoning via progress assessment [3].

Two implications matter in practice.

First, you need states and transitions. A ToT prompt that says "think of three solutions" but doesn't define what a "step" is will collapse into a blob of text.

Second, you need an evaluation signal. That can be self-critique, a rubric, unit tests, constraints, a verifier model, or even "does this branch still satisfy the requirements?" Without scoring, you're not searching-you're just generating.

Here's the mental model I use: ToT is "generate → score → prune → expand → backtrack." If you don't have at least "score → prune," you don't really have ToT.


Step-by-step: implementing ToT as a prompting pattern

I'll describe this as if you're building a small reasoning loop in an app, but you can also run the same flow manually in a chat by copy-pasting.

Step 1: Define what a "thought" is (make steps discrete)

If you let the model free-write, you can't reliably branch or backtrack. That's why I like the "one thought at a time" approach: it forces crisp boundaries.

A recent paper on self-correction shows that when models generate reasoning as discrete, semantically coherent steps, they can localize errors more precisely and successfully backtrack to a clean prefix before continuing [2]. That's basically ToT mechanics applied to debugging: structure creates good "branch points."

So we start by telling the model what a thought looks like and how to end it (a delimiter).

Step 2: Branch deliberately (candidate generation)

At each step, you ask for k candidate next thoughts. This is your branching factor. In practice, k=3 to 5 is plenty. More than that and you pay a lot of tokens for low marginal diversity unless you also push diversity (different strategies, assumptions, or decompositions).

Step 3: Score each candidate using a rubric

ToT's core is the "value function" idea: evaluate progress and pick the branch worth expanding [1]. In pure prompting, I treat scoring as a mini-judge prompt: "Given goal + constraints + current partial solution, rate this next step."

Be careful here. Another line of research warns that chain-of-thought text can be unfaithful: plausible rationalizations, encoded steps, or internalized reasoning can make the "reasoning trace" look good while not being causally related to the answer [4]. Translation: don't score on eloquence. Score on constraint satisfaction and checkability.

Step 4: Prune and expand (beam search works well)

Keep the top b branches (beam width). Expand each of them one step. Repeat until you hit a termination condition.

Typical termination conditions: you reached a final answer, you hit max depth, or every branch looks stuck.

Step 5: Backtrack when stuck

Backtracking is the difference between "multi-sample" and "search." When a branch violates a constraint or fails verification, you don't just ask for a new answer-you roll back to the last good step and try a different continuation.

This mirrors what the structured self-correction framework does: verify → localize first error → backtrack to the last correct step → resample a new continuation [2]. That loop is extremely ToT-flavored, and it's a good blueprint when you want "search" to produce not just an answer, but a correct one.


Practical prompts you can copy

Below are two templates: a "pure ToT" search template and a "ToT + backtracking" template inspired by structured self-correction [2].

Template 1: Manual ToT (single-message, no code)

Use this when you're doing it in a chat and you're okay with the model managing the tree internally.

You are solving a hard problem. Use Tree-of-Thought search.

Rules:
- Work in steps. Each step: propose multiple candidate next thoughts, score them, pick one, and continue.
- If you detect a contradiction or low confidence, backtrack to the last good step and try a different branch.
- Keep the final answer separate from the search.

Problem:
{paste problem}

Output format:
Step 1:
Candidates:
A) ...
B) ...
C) ...
Scores (0-10) with brief justification:
A: ...
B: ...
C: ...
Chosen: {A|B|C}

Step 2:
...

Final Answer:
{final}

This is "prompt-only" ToT. It works surprisingly often, but it's fragile: the model might skip scoring, or it might not truly backtrack.

Template 2: ToT with explicit step delimiter (better for orchestration)

This borrows the "end each thought with a delimiter" trick that makes steps parseable and backtrackable [2].

You are solving a problem with deliberate search.

Instructions:
1) Generate exactly {k} candidate next thoughts for the current state.
2) Each thought must be a single coherent step and end with </thought>.
3) After generating candidates, score each candidate against the rubric.
4) Output only the chosen next thought (verbatim) as CHOSEN_THOUGHT.
5) Do not produce a final answer unless explicitly asked.

Rubric (score 0-10):
- Correctness pressure: does it maintain invariants and constraints?
- Progress: does it move toward a solution (not restating)?
- Verifiability: can we check it quickly?
- Risk: does it introduce assumptions?

Current state:
{paste the question + the current partial solution steps}

Now generate candidates and choose.

In an app, you run this in a loop. Store each </thought> step. If a verifier fails, truncate the list to a previous step and resume from there.

A small real-world tweak: "recursive CoT" as a cheap ToT-lite

People in the wild often approximate ToT by forcing 3 alternative reasoning paths and comparing them (a kind of self-critique ensemble) [5]. It's not full search-no multi-step branching-but it's a decent "budget ToT" for tasks like tricky bug triage or ambiguous product decisions.

If you try it, treat it as one expansion layer of a tree. Useful, but don't confuse it with backtracking search.


When ToT is worth it (and when it's not)

ToT shines when the problem has genuine branching: planning, puzzles, multi-constraint specs, architecture tradeoffs, or anything where early choices can trap you.

It's overkill when the task is straightforward extraction, summarization, or a well-specified transformation. In those cases, ToT often just burns tokens.

Also, don't fall into the "more reasoning text = better" trap. Work on chain-of-thought pathologies argues that visible reasoning can be misleading, sometimes even decoupled from the actual computation [4]. That's why evaluation and verification matter: ToT isn't "write more," it's "explore more, then check."


Closing thought: treat ToT like a search product, not a prompt trick

If you want ToT to be reliable, you need to think like you're building a search system. Define node structure. Define a scoring function. Define pruning. Define stop conditions. And ideally, define a verifier.

Do that, and ToT stops being a magical incantation and becomes a predictable way to buy accuracy with compute-exactly what it was meant to be [1].


References

  1. Tree of Thoughts: Deliberate Problem Solving with Large Language Models - NeurIPS / arXiv (Yao et al., 2023) - http://arxiv.org/abs/2305.10601
  2. Structure Enables Effective Self-Localization of Errors in LLMs - arXiv - http://arxiv.org/abs/2602.02416v1
  3. From Instruction to Output: The Role of Prompting in Modern NLG - arXiv - https://arxiv.org/abs/2602.11179
  4. Diagnosing Pathological Chain-of-Thought in Reasoning Models - arXiv - https://arxiv.org/abs/2602.13904

Community Examples
5. Stop using "Think Step by Step"-Use 'Recursive Chain of Thought' instead. - r/PromptEngineering - https://www.reddit.com/r/PromptEngineering/comments/1qwv6su/stop_using_think_step_by_stepuse_recursive_chain/
6. PromptViz - Visualize & edit system prompts as interactive flowcharts - r/PromptEngineering - https://www.reddit.com/r/PromptEngineering/comments/1qt8nx4/promptviz_visualize_edit_system_prompts_as/

Ilia Ilinskii
Ilia Ilinskii

Founder of Rephrase-it. Building tools to help humans communicate with AI.

Related Articles

How to Prompt AI for Better Software Tests
prompt tips•8 min read

How to Prompt AI for Better Software Tests

Learn how to write AI testing prompts for unit tests, E2E flows, and test data generation with better coverage and fewer retries. Try free.

How to Write CLAUDE.md Prompts
prompt tips•7 min read

How to Write CLAUDE.md Prompts

Learn how to write CLAUDE.md prompts that give Claude Code lasting project memory, better constraints, and fewer repeats. See examples inside.

How to Prompt AI for Ethical Exam Prep
prompt tips•8 min read

How to Prompt AI for Ethical Exam Prep

Learn how to use AI for exam prep without cheating by writing ethical prompts that build understanding, not shortcuts. See examples inside.

How Teachers Can Write Better AI Prompts
prompt tips•8 min read

How Teachers Can Write Better AI Prompts

Learn how to write AI prompts for teachers that improve lesson plans, rubrics, and differentiation without losing control. See examples inside.

Want to improve your prompts instantly?

On this page

  • What "Tree of Thoughts" actually means (and what it doesn't)
  • Step-by-step: implementing ToT as a prompting pattern
  • Step 1: Define what a "thought" is (make steps discrete)
  • Step 2: Branch deliberately (candidate generation)
  • Step 3: Score each candidate using a rubric
  • Step 4: Prune and expand (beam search works well)
  • Step 5: Backtrack when stuck
  • Practical prompts you can copy
  • Template 1: Manual ToT (single-message, no code)
  • Template 2: ToT with explicit step delimiter (better for orchestration)
  • A small real-world tweak: "recursive CoT" as a cheap ToT-lite
  • When ToT is worth it (and when it's not)
  • Closing thought: treat ToT like a search product, not a prompt trick
  • References