Prompt TipsFeb 13, 202610 min

How to Research With AI (Without Getting Burned by Hallucinations)

A practical workflow for using LLMs as research assistants: plan, search, verify, synthesize, and keep an evidence trail you can trust.

How to Research With AI (Without Getting Burned by Hallucinations)

Research with AI is weird right now.

On a good day, it feels like you hired a fast junior analyst who never sleeps. On a bad day, it feels like that analyst just invented three papers, misread two charts, and confidently shipped you a "summary" that collapses the moment you click a link.

The trick is to stop treating the model like an answer machine and start treating it like a process machine. Your job isn't to "ask better questions" in the abstract. Your job is to design a research loop where the model has to show its work, and where you can audit every important claim.

That idea shows up again and again in recent agent research: strong systems don't just parallelize a bunch of searches and stitch them together. They keep a centralized context, iterate, reflect, and refine the plan based on what they learned-because research is inherently sequential, not a one-shot prompt lottery [3]. And when agents are evaluated on realistic web tasks, the big failures aren't "can it browse?" but "did it actually find decisive evidence, or did it hallucinate an investigative story?" [4].

So here's how I actually recommend researching with AI in 2026: build a tight loop around planning, evidence retrieval, verification, and synthesis.


Step 1: Ask for a research plan, not a report

Most people start with "Write me a report about X." That's backwards. A good researcher starts by deciding what would count as a good answer, which subquestions matter, and what sources are authoritative.

METIS-a stage-aware "research mentor" system-formalizes this nicely. It routes behavior based on where you are in the research/writing process (idea, plan, draft, final), and it pushes the user toward concrete next steps rather than vague inspiration [2]. You don't need METIS to benefit from the pattern. You just need to force a stage and a plan.

A simple prompt move that works well is: ask for a plan with explicit deliverables and a stopping condition. Make the model commit to what it will search for, how it will judge relevance, and what it will do if sources disagree.

Use something like this:

You are my research assistant. We are in the "research plan" stage.

Topic: {your topic}
Goal: {decision I'm trying to make / question I need to answer}
Constraints: {time, domain, audience}

Before writing any conclusions:
1) Propose a 6-10 step research plan with subquestions.
2) For each step, list what evidence would confirm/deny it.
3) Define "done": what would make the answer reliable enough to ship?
4) Tell me what you expect to be contentious or easy to hallucinate.
Only then ask me 3 clarifying questions.

Here's what I noticed: forcing the plan first dramatically reduces "confident nonsense," because it nudges the model into procedural mode instead of performative mode.


Step 2: Run research as a loop (plan → search → reflect → update)

If you've used "deep research" style tools, the best ones don't feel like one long prompt. They feel like cycles.

The Deep Researcher "Reflect Evolve" paper lays this out as a sequential refinement loop: create a plan, generate a search query, retrieve evidence, then reflect on whether the plan needs to change given what you found, while maintaining a global research context so you don't repeat yourself [3]. That "global context" concept is the whole game. It prevents the silo problem where each sub-agent confidently repeats the same shallow facts.

You can approximate this manually in a chat by explicitly structuring turns:

  1. Plan (what to look for).
  2. Search round (what it found, with links).
  3. Reflection (what's missing, what's contradictory).
  4. Next search round.

If you're building product, this also maps cleanly to an agent architecture: a planner, a retriever, and a critic/reflection step that decides what to do next [3].


Step 3: Treat web research as hostile territory (verify, don't vibe)

PATHWAYS is one of the most useful wake-up calls I've read recently. It evaluates web agents on tasks where the decisive information is not on the first page. Agents often navigate to roughly the right area, but fail to extract the hidden context, then "fill in the blank" with invented reasoning. The paper names this directly: investigative hallucination-the agent claims it checked evidence it never accessed [4].

This is exactly what happens when you ask an LLM to "research online" and you don't require an evidence trail.

So add hard constraints:

Ask for quotes. Ask for URLs. Ask for "what page or section did you use." Ask for contradictions. And if you're doing anything important-pricing, compliance, medical, security-make the model separate "directly supported by sources" from "inference."

A prompt I reuse:

For every key claim, include:
- Source URL
- A short quote (1-2 sentences) that supports the claim
- Your interpretation of the quote
If you can't quote support, mark the claim as "unverified" and don't use it in conclusions.

It feels strict. That's the point.


Step 4: Build a citation firewall (assume the model will fabricate)

If you've ever asked for "related work," you've seen the failure: a blended list of real and fake citations that look plausible enough to slip into a draft.

A recent community thread describes exactly this pain: real-looking titles, real-looking authors, nonexistent papers-and the user stuck doing slow manual DOI checks [5]. Tier-2 source, sure, but the workflow lesson is real: you need a verification pass that is separate from the writing pass.

My rule: the model is not allowed to invent citations. It can only suggest leads, and every lead must be resolved to a real DOI/arXiv entry/publisher page before it becomes a "source."

You can enforce this with a "citation validator" prompt:

You are a citation verifier.

Input: a list of citations (title, authors, year, venue) from my draft.
Task:
1) For each item, determine if it is real.
2) If real: return canonical link(s) (DOI, arXiv, publisher).
3) If not real or ambiguous: mark as "NOT VERIFIED" and suggest the closest real match.
Output a table. Do not guess.

Yes, you still have to click links. But now you're auditing a shortlist, not spelunking through fiction.


Step 5: Synthesize like a product decision, not like a school report

Once you have evidence, don't ask for "a summary." Ask for a decision memo.

Good synthesis has tension. It calls out tradeoffs, uncertainty, and what would change your mind. METIS explicitly bakes in stage-aware behavior and "methodology checks," because later-stage work needs sharper critique and grounded claims [2]. You can mimic this by forcing a structure: claims, evidence, counterevidence, confidence, next test.

Try:

Write a decision memo for {audience}.

Include:
- 5-8 key claims
- For each: supporting evidence (with links/quotes), counterevidence, and confidence (low/med/high)
- What we'd do next if we had 1 day vs 2 weeks
- "What would change my mind" section
No filler. If evidence is weak, say so.

This turns AI from "content generator" into "thinking partner with receipts."


Practical example: a full research loop prompt

Here's a compact workflow prompt you can paste into a tool that can browse, or use manually with your own searching:

We're researching: "Should we adopt {tool/approach} for {use case} in Q2?"

Rules:
- Don't write a report yet.
- Maintain a running evidence log with URLs and quotes.
- Separate facts from inferences.
- If sources conflict, show both.

Step 1 (Plan):
Draft a research plan with subquestions, and tell me what would count as strong evidence.

Step 2 (Round 1 search):
Pick the first 3 subquestions, propose 3-5 search queries each, and tell me what sources you expect (docs, papers, benchmarks, case studies).

Wait for my "go", then we execute Round 1.

Step 3 (Reflect):
After Round 1, list gaps, contradictions, and the next best subquestions.

Step 4 (Synthesize):
Only when I say "synthesize", write a decision memo with citations.

What works well here is the pacing. You're not letting the model sprint to a conclusion before you've even agreed on what "good research" means.


Closing thought

AI makes research faster, but it also makes it easier to launder uncertainty into confident prose. The fix isn't a magic prompt. It's a loop: plan, retrieve, verify, reflect, then synthesize-with a paper trail.

If you try one change this week, make it this: require quotes and links for every non-trivial claim. You'll feel the model slow down. That's not a bug. That's the cost of being right.


References

References
Documentation & Research

  1. Inside OpenAI's in-house data agent - OpenAI Blog - https://openai.com/index/inside-our-in-house-data-agent
  2. METIS: Mentoring Engine for Thoughtful Inquiry & Solutions - arXiv - https://arxiv.org/abs/2601.13075
  3. Deep Researcher with Sequential Plan Reflection and Candidates Crossover (Deep Researcher Reflect Evolve) - arXiv - https://arxiv.org/abs/2601.20843
  4. PATHWAYS: Evaluating Investigation and Context Discovery in AI Web Agents - arXiv - https://arxiv.org/abs/2602.05354

Community Examples
5. How do you deal with mixed real and AI-generated citations in a draft? - r/PromptEngineering - https://www.reddit.com/r/PromptEngineering/comments/1qumjge/how_do_you_deal_with_mixed_real_and_aigenerated/
6. How do you study good AI conversations? - r/PromptEngineering - https://www.reddit.com/r/PromptEngineering/comments/1qp7get/how_do_you_study_good_ai_conversations/

Ilia Ilinskii
Ilia Ilinskii

Founder of Rephrase-it. Building tools to help humans communicate with AI.

Related Articles