A practical way to prompt Nano Banana for image generation and editing-without cargo-culting camera specs or fighting the model.
Nano Banana prompts have a weird reputation.
People talk about it like it's either magic ("it just understands") or impossible ("it ignores my reference image and does whatever"). What's interesting is that both are true, depending on whether your prompt gives the model something it can control.
Here's my working mental model: Nano Banana is less impressed by adjectives and more responsive to constraints and visual anchors-the stuff that narrows the space of valid images. That lines up with research showing that providing the right kind of context can act as a control signal, and that increasing "inference budget" (or just iterating properly) changes outcomes dramatically in visual tasks [1]. It also lines up with a broader prompt-engineering lesson: you get reliability when you specify goals, context, and an output contract-then test and iterate instead of praying [2].
So in this post, I'm going to show you how I prompt Nano Banana in a way that's consistent, debuggable, and easy to reuse.
Nano Banana is the nickname people use for Gemini 3 Pro Image. In a recent visual reasoning paper, the authors explicitly refer to "Gemini 3 pro image (nano banana pro)" and evaluate it in an image-editing setting for a tangram-like task, where geometric consistency matters a lot [1]. That matters because it hints at what Nano Banana is good at: taking visual context and transforming it while trying to preserve constraints.
The catch is that image models don't "follow instructions" the way text models do. They follow signals. Your job is to make those signals unambiguous.
In other words: prompt it like you're writing a spec, not a poem.
I use a consistent structure. Not because the model "needs JSON" (it doesn't), but because I need prompts I can debug.
I split prompts into four parts: Task, Subject lock, Transform, and Constraints.
Nano Banana does better when the first sentence is basically a verb.
Generate, edit, restore, re-style, extend, remove, replace. Pick one.
This is the equivalent of "goal specification" in prompt design: tell the model what success is before you drown it in details [3].
A lot of "Nano Banana changed my person into a new person" complaints come from missing identity constraints.
There's a Reddit post where someone tries to restore an old photo of their mom and the model "creates a new person" [4]. That's not the model being evil. That's the model doing what generative models do when identity isn't constrained: it hallucinates a plausible face under the umbrella of "improve clarity."
So I explicitly lock what can't change: identity features, pose, composition, logos, text, whatever matters.
Instead of "make it better," I specify deltas: remove creases, correct color cast, increase sharpness, reduce noise, preserve grain, keep lighting direction.
This is also where you should prefer measurable-ish or concrete terms over vibes. A popular community write-up on Nano Banana prompting makes the same point: quantified parameters and professional terminology tend to outperform vague adjectives [5]. I'd treat the examples there as inspiration, not gospel, but the principle is solid.
Negative constraints are not optional. They're often the difference between "nice" and "why did you add text and extra fingers."
Also, constraints make prompts more reproducible under paraphrase and minor perturbations-one of the core ideas in robust prompt evaluation [3].
These are written for chat-style usage where you upload an image and prompt Nano Banana to edit it.
Task: Restore and enhance this photo.
Subject lock:
- Preserve the same person/identity. Do not change facial structure, age, ethnicity, hairstyle, or expression.
- Preserve the original pose, framing, and camera angle.
Transform:
- Remove creases, scratches, dust, and stains.
- Improve clarity and focus gently (no over-sharpening).
- Reduce noise while keeping natural film grain.
- Correct faded colors to look natural for the era.
Constraints:
- Do not invent new background elements.
- Do not add or remove jewelry, moles, scars, or distinctive marks.
- No beauty retouching, no "AI face".
- No text, no watermark.
If you're restoring a family photo, that "Subject lock" section is the whole game.
Task: Re-style this image into a new art direction.
Subject lock:
- Keep the exact composition and layout.
- Keep the same subject shapes and proportions.
New style:
- Art direction: [e.g., 90s editorial flash photography / Swiss poster design / watercolor children's book]
- Color palette: [3-6 colors]
- Lighting: [soft daylight / hard flash / tungsten]
- Texture: [paper grain / film grain / clean vector]
Constraints:
- No extra objects.
- No text or logos.
- No distorted faces or warped hands.
Notice what's missing: a 20-line list of camera specs. You can add those, but they're seasoning.
Task: Create a clean product photo using this item as reference.
Subject lock:
- Preserve the product design exactly (shape, buttons, logo placement, proportions).
Scene:
- Background: seamless [white/black/gradient]
- Surface: [acrylic / matte stone / wood]
- Lighting: soft studio, controlled reflections
Constraints:
- Do not redesign the product.
- No warped geometry.
- No extra text, no watermark, no fake brand marks.
One big reason prompting feels random is that people do one shot, then rewrite the whole prompt from scratch.
I prefer tight iteration loops: keep the prompt mostly stable, change one variable, and use the model output as evidence.
This is basically "test prompts over representative inputs, measure tail risk, and make prompts degrade gracefully" translated into normal-person workflow [3].
When Nano Banana misses, I ask: did it fail at subject lock, transform, or constraints?
Then I patch that section only.
Yes, adding photography terms can help. The community post that analyzed viral Nano Banana prompts claims specifics like lens length and film stock outperform vague adjectives [5]. I've seen that pattern too.
But here's the thing: if you're using Nano Banana for editing or identity-sensitive work, camera jargon won't save you. Constraints will.
So I treat camera jargon as optional flavor that comes after the hard requirements are nailed down.
Nano Banana is strongest when you give it a stable target and tight bounds. Visual context is a control signal [1]. Output reliability comes from clear goals and contracts, plus iteration [3].
If you try one thing this week, try this: take your current Nano Banana prompt and add a Subject lock section with three "must not change" bullets. For most image editing workflows, that single change will feel like you upgraded models.
Documentation & Research
Community Examples