Blog / Prompt tips / How to Prompt for 1M Token Contexts

How to Prompt for 1M Token Contexts

Learn how to write prompts for 1M token context windows without overwhelming the model. Keep long-context prompts sharp, useful, and cheap. Try free.

Ilia Ilinskii
Rephrase · April 21, 2026

Prompt tips8 min read

On this page

Key Takeaways Why do 1M-token prompts still fail?How should you structure prompts for long context windows?Where should important instructions go in a 1M-token prompt?How much context should you actually include?How do you keep long-context prompts from drifting over time?References

A 1M-token context window sounds like permission to paste your whole universe into the prompt. It isn't. The bigger the window gets, the more discipline you need.

Key Takeaways

Bigger context windows increase capacity, not judgment. More tokens can still mean worse answers.
Put instructions and decision criteria at the edges of the prompt, not buried in the middle.
Treat long-context prompting as context engineering, not just prompt writing.
Use summaries, indexes, and retrieval rules so the model can navigate context instead of drowning in it.
Re-anchor important constraints during long workflows when outputs stretch across many turns.

Why do 1M-token prompts still fail?

A 1M-token prompt still fails when it mixes critical instructions with irrelevant material, because the model has limited effective attention even inside a large window. Research and technical guides both show that context size is not the same thing as context usability, especially when important information sits in the middle or gets diluted by noise [1][2][3].

Here's the first mindset shift I'd make: stop thinking "How much can I fit?" and start thinking "What must remain salient?"

That sounds obvious, but it changes everything. In a short prompt, sloppiness is expensive. In a giant prompt, sloppiness is fatal. The model can technically ingest your sprawling spec, twenty PDFs, a codebase dump, meeting notes, and a few random Slack threads. But if the task only needs six constraints, three examples, and a narrow slice of the reference docs, stuffing the rest in creates distraction, latency, and cost without adding signal [1][2].

The newer context-engineering literature makes this point clearly: output quality often depends less on clever phrasing and more on assembling the right informational payload in the right structure [2].

How should you structure prompts for long context windows?

Long-context prompts work best when they separate instructions, context, and output criteria into distinct layers. This reduces ambiguity, preserves important constraints, and makes it easier for the model to identify what governs the task versus what merely informs it [2][3].

I like to think in three layers.

First, the control layer: what the model must do, how it should decide, what it should avoid, and what format it must return. This is your operating system.

Second, the navigation layer: a compact map of the context. Think summaries, section labels, file names, document index, chronology, or a list of sources by priority. This tells the model where useful information lives.

Third, the payload layer: the raw material itself. Documents, transcripts, code, specs, tickets, or notes.

When people get poor results with big windows, they usually flatten all three into one giant blob.

Here's a before-and-after example.

Version	Prompt
Before	"Here are 12 docs, our meeting notes, the product spec, and some code. Review everything and tell me what to build."
After	"You are planning a v1 implementation. Follow the priority order: (1) API spec, (2) security constraints, (3) product requirements, (4) meeting notes. Ignore duplicate ideas unless repeated in the top two sources. First, extract non-negotiable constraints. Second, propose architecture options. Third, recommend one plan in a table with trade-offs."

Same task. Very different odds of success.

If you do this often, tools like Rephrase are useful because they can turn rough instructions into cleaner task structure fast, especially when you're moving between ChatGPT, Claude, Gemini, or coding assistants.

Where should important instructions go in a 1M-token prompt?

Important instructions belong at the beginning and, when possible, repeated near the end or near the active task boundary. Long-context research shows models tend to favor information at the edges over content buried in the middle, so placement is a practical reliability tool, not just formatting polish [1][3].

This is where the "lost in the middle" finding becomes practical. If your key rule sits somewhere around token 420,000 between two giant pasted docs, don't be shocked when the model ignores it [1].

What works better is edge placement plus light repetition. I'm not talking about spammy all-caps warnings. I mean:

put the main objective and constraints up front
add a short "active task recap" right before the model needs to answer
restate output requirements near the end of the prompt or right before the final instruction

For long multi-turn workflows, re-anchoring matters even more. One useful community pattern is asking the model to restate its current constraints or state before continuing. That advice is anecdotal, not foundational, but it lines up with how long conversations drift in practice [4].

A simple pattern looks like this:

Task: Propose a migration plan using only the sources below.

Non-negotiable constraints:
- No downtime
- EU data residency only
- PostgreSQL only, no Redis
- Output as a phased table

Before answering, restate the constraints you are following in 4 bullets.

That extra step is cheap. The payoff is often huge.

How much context should you actually include?

You should include enough context to make the task solvable, but not so much that the model must wade through irrelevant detail. Bigger windows help, yet research on context quality and memory systems shows excess context can raise cost, create drift, and trigger information loss through compression or poor retrieval [2][5].

This is the part most teams get wrong. They assume retrieval and filtering are old problems now because frontier models accept massive inputs. But capacity is not the same thing as memory quality [5].

One recent paper found that in-context memory can work surprisingly well within the window for structured facts, yet still breaks in production when systems rely on compaction, summarization, or sprawling persistent prompts. Their conclusion is basically this: storing everything in context is brittle, and goal drift becomes a real issue over time [5].

My practical rule is simple: paste less, point better.

Instead of dumping 50 files, include:

a short task brief
a source index
only the top-priority excerpts
rules for what to consult if evidence conflicts
a clear output contract

That's context engineering. And it scales better than context stuffing.

If you want more patterns like this, the Rephrase blog has plenty of prompt workflows worth borrowing.

How do you keep long-context prompts from drifting over time?

To reduce drift, convert vague constraints into explicit checks, use structured outputs when possible, and periodically refresh the model's active state. Long generations and long chats both dilute earlier instructions, so prompts need checkpoints, not just a strong opening [2][4][5].

Here's what I've noticed: giant prompts often fail less from misunderstanding and more from gradual drift. The model starts well, then slowly defaults to generic behavior.

Three fixes help.

First, use positive constraints instead of fuzzy negatives. "Use a board-ready tone with bulletless prose and concrete trade-offs" works better than "don't be vague."

Second, use structured outputs when you can. If the result must be a table, schema, or defined sections, say so. Don't leave format compliance to vibes.

Third, for long-running work, ask for interim state. Not chain-of-thought. Just concise status tracking. For example: assumptions, open questions, constraints followed, sources used.

That's also why I like keeping a reusable long-context scaffold handy. Or just using Rephrase to rewrite a rough brief into a cleaner prompt before I send it.

The big idea is simple: 1M-token windows don't remove the need for prompt engineering. They raise the bar for it.

If your prompt feels like a storage unit, shrink it into a system: rules first, map second, payload third. That's how you give the model more to work with without giving it more ways to get lost.

References

Documentation & Research

From Tokens To Agents: A Researcher's Guide To Understanding Large Language Models - arXiv cs.CL (link)
Context Engineering: A Practitioner Methodology for Structured Human-AI Collaboration - arXiv cs.AI (link)
Lost in the Middle: How Language Models Use Long Contexts - cited within arXiv sources and discussed in long-context literature (link)
Facts as First-Class Objects: Knowledge Objects for Persistent LLM Memory - arXiv cs.AI (link)

Community Examples 5. Prompt Drift is not a bug-it's the physics of Attention Attrition. Here is how to fix it. - r/PromptEngineering (link)

Frequently asked

How do you write prompts for 1M token context windows?

Start by separating instructions from reference material, then prioritize only the context the model actually needs. Large windows increase capacity, but prompt structure still matters more than raw volume.

What is lost in the middle in long-context prompting?

Lost in the middle describes a long-context failure mode where models pay more attention to information near the beginning and end than to content buried in the center. That means prompt placement matters as much as prompt length.