Midjourney prompts don't usually fail because you "didn't use enough adjectives." They fail because you didn't anchor the parts that need to stay stable while you iterate.
That's the whole game in v7: you're constantly doing little experiments. You change lighting, camera, composition, wardrobe, background, color grading, mood. But you want the subject to remain the same. Or you want the look to remain the same. Or both.
This is where --cref and --sref are supposed to shine: one anchors identity, one anchors style. The catch is that most people treat them like magic spells and then wonder why the results drift anyway.
So instead of giving you a "prompt cheat sheet," I'm going to give you a reusable prompt syntax that behaves more like a product spec: it's modular, you can swap parts without breaking everything, and you can reason about why outputs change.
Along the way, I'll lean on research that's not about Midjourney specifically, but is about why reference-guided systems are more stable than purely descriptive instructions. That stability is exactly what we're chasing with cref/sref. In other words: Midjourney v7 is pushing you toward the same principle that keeps showing up in LLM alignment and evaluation-references reduce ambiguity and improve consistency when ground truth is fuzzy. That's a core finding in reference-guided evaluation work [1].
The mental model: "describe less, reference more"
When you prompt Midjourney like it's a thesaurus contest, you're forcing the model to infer a ton of latent structure. What face? What outfit silhouette? What lens language? What era? What rendering pipeline vibe?
References are different. A reference collapses ambiguity. Instead of hoping the model interprets "cyberpunk cinematic portrait" the same way each time, you pin it to something concrete and let your text prompt focus on intent.
That principle shows up in research on reference-guided judging: adding high-quality references improves reliability in domains where "correctness" isn't verifiable in a strict way (taste, writing quality, alignment) [1]. Images are the same kind of domain. There isn't a single correct output. There's just "more like this."
So if you want prompts that "stick," stop trying to encode identity and style purely in text. Use references for the sticky bits. Use text for the variables.
A prompt syntax that survives iteration
Here's the syntax I keep coming back to. It's not a Midjourney "rule," it's a structure that makes iteration predictable.
You'll write the prompt in five conceptual blocks:
- identity anchor (cref)
- style anchor (sref)
- subject + shot (what's happening, what camera moment)
- constraints (what must be true; what must not happen)
- parameters (aspect ratio, quality, etc.)
In practice, that becomes a single line, but in your head it stays modular.
Here's a template you can paste into Midjourney and fill in:
[SUBJECT + SHOT]: <who/what> doing <action> in <setting>, <camera framing>, <lens>, <lighting>, <mood>, <key materials/colors>
[CONSTRAINTS]: keep <fixed traits>; avoid <failure modes>; background <simple/complex>; palette <x>; era <x>
--cref <character_reference_image_url> --sref <style_reference_image_url> --ar <w:h> --stylize <value>
The reason this works is simple: you're separating "anchors" from "dials." If you change the shot, you're not also accidentally changing the character. If you change the background, you're not also accidentally changing the style.
This is also very close to how modern agent prompts are written in research systems: they isolate stable structure and then iterate on the variable part. A lot of agent papers are basically about this kind of decomposition-make the state explicit, keep constraints explicit, and avoid drift across iterations [2].
How I think about --cref in v7 (and why it drifts)
--cref is your identity lock. Not a perfect lock. More like a "pull toward this identity."
Drift still happens because your text prompt can fight the reference. If your cref is a person in soft window light and you prompt "harsh top-lit noir, extreme shadows, sweaty face, bruised cheek," you are pushing the face into a different identity manifold. Midjourney will comply with the intent even if it means bending identity.
So the trick is not "use cref." The trick is "don't ask for things that require identity changes."
Here's what tends to break identity consistency even with cref:
You over-specify facial descriptors ("square jaw, freckles, hooked nose") that conflict with the reference. Midjourney will average the two.
You change age/era too hard ("as a 70-year-old") and the model "resets" the face.
You change camera distance dramatically (close-up vs full-body) without compensating with constraints (hair, clothing, silhouette).
You change rendering modes (hyperreal vs illustration) without a stable sref.
So with cref, I keep the text prompt's "human descriptors" light. I describe the shot, not the identity. Identity lives in cref plus a tiny set of invariants (hair style, signature accessory, outfit silhouette).
How I think about --sref in v7 (style is a system, not adjectives)
--sref is a style prior. It's not just "colors." It often encodes a whole pipeline: contrast curve, grain/noise, line weight, texture language, even composition habits.
That matters because lots of "style prompts" are really multiple styles glued together. You'll see prompts like "Studio Ghibli, cyberpunk, vaporwave, Leica portrait, Unreal Engine." That's not a style. That's a fight.
Instead, treat sref as the single source of truth for the look, and use text for minor nudges: "more negative space," "warmer shadows," "less bloom."
This is the same reference-guided idea again: if you want stable outputs in a subjective domain, give the system a reference rather than an argument [1].
Practical prompts you can actually reuse
I'm going to show three patterns: character-first, style-first, and "locked anchors, rotating shot library."
1) Character-first (product shots, brand mascots, consistent protagonists)
Full-body character turnaround sheet, neutral studio background, 3 views (front, side, back), clean softbox lighting, sharp focus, realistic fabric detail, simple color palette
Keep same face, same haircut, same outfit silhouette; avoid extra accessories; avoid dramatic shadows
--cref https://example.com/char_ref.png --sref https://example.com/style_ref.png --ar 3:2 --stylize 100
This prompt is boring on purpose. When you're trying to establish a stable character, boring is a feature. You can always get fancy after you have a reliable base.
2) Style-first (campaign visuals where the "look" matters more than identity)
Wide establishing shot of a rain-soaked street market at night, neon reflections, layered depth, shallow haze, cinematic composition, 35mm lens look, subtle motion blur feel
Maintain the exact color grading and texture language; avoid cartoon outlines; avoid overly clean CGI
--sref https://example.com/style_ref.png --ar 16:9 --stylize 250
Notice: no cref. If identity doesn't matter, don't introduce a second anchor.
3) Locked anchors, rotating shot library (the "prompt syntax that sticks")
This is where v7 becomes fun. You define a stable pair of references and then you cycle through a list of shot intents.
Portrait of the same character, [SHOT_SLOT], [LIGHT_SLOT], [ENV_SLOT], cinematic realism, crisp skin texture, natural color
Keep same identity and hairstyle; avoid changing age; avoid changing ethnicity; keep clothing consistent unless specified
--cref https://example.com/char_ref.png --sref https://example.com/style_ref.png --ar 4:5 --stylize 150
Then you run it multiple times, swapping only the slots:
SHOT_SLOT: "tight close-up, eyes to camera"
LIGHT_SLOT: "golden hour rim light"
ENV_SLOT: "subway platform, soft bokeh signage"
Next run:
SHOT_SLOT: "three-quarter portrait, looking off-frame"
LIGHT_SLOT: "soft overcast, low contrast"
ENV_SLOT: "minimalist interior, white walls"
This is the exact "sticking" behavior you want: identity and look stay anchored, while your scene direction changes.
What community practice gets right (and what it misses)
In prompt-engineering communities, the best advice is usually about iteration loops, not about magical phrasing. People talk about using clarification loops to force specificity, and they're right: you get better results when you systematically close ambiguity instead of guessing [3].
The gap is that most of that advice is text-only. Midjourney v7 gives you a better lever than more text: references. If you're doing a loop, your first question should often be, "What do I want to anchor with references so I can stop re-describing it every time?"
Closing thought: treat your prompt like an API contract
If you want Midjourney v7 prompts that actually stick, think like a developer. Your prompt is an interface. Your cref/sref are dependencies. Your shot and constraints are the request payload. Your parameters are config.
Build a stable contract, then iterate on small, named parts. That's how you stop "prompt drift" from eating your time.
References
Documentation & Research
- References Improve LLM Alignment in Non-Verifiable Domains - arXiv cs.CL - https://arxiv.org/abs/2602.16802
- RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents - arXiv - http://arxiv.org/abs/2602.02486v1
Community Examples
- Clarification prompt pattern with MCQ options + copy-paste answer template - r/PromptEngineering - https://www.reddit.com/r/PromptEngineering/comments/1r6w76y/clarification_prompt_pattern_with_mcq_options/
-0161.png&w=3840&q=75)

-0162.png&w=3840&q=75)
-0160.png&w=3840&q=75)
-0157.png&w=3840&q=75)
-0156.png&w=3840&q=75)