Learn how reference image compositing in Photoshop preserves identity, reduces masking work, and speeds edits. See the workflow inside.
Manual masking used to be the tax you paid for believable composites. In 2026, the smarter workflow is different: keep the subject fixed, let AI handle the local edit, and only step in when the image actually needs craft.
Reference image compositing in Photoshop is an AI-assisted editing workflow where the model uses an existing image as the source of truth and modifies only the intended content. In practice, that means you preserve face, pose, texture, and scene structure while editing backgrounds, objects, lighting, or styling in a controlled way [1][3].
Here's the big shift I've noticed. The old approach started with extraction: pen tool, Select Subject, hair cleanup, spill cleanup, edge cleanup, then finally compositing. The 2026 approach starts with preservation. You don't ask Photoshop to remake the whole scene. You ask it to keep the person, then alter a bounded part of the image.
That sounds subtle, but it changes everything.
A recent CVPR 2026 paper on reference-based inpainting makes the same point from the research side: global editing tends to lose fine-grained detail, while reference-guided inpainting is better at preserving structure, texture, patterns, and local consistency [1]. That is exactly why identity-safe composites now feel more reliable than the "generate me a new image" era.
It killed a lot of manual masking because modern compositing workflows rely less on coarse masks and more on localized, detail-preserving edits. Research on image tampering and pixel-level evaluation shows that broad masks are often a bad proxy for what actually changed in an image, especially around boundaries, relighting, and subtle texture shifts [2].
That research matters more than it seems. The PIXAR benchmark paper argues that classic mask-based thinking misses the real edit footprint, because meaningful changes often extend beyond the obvious object silhouette [2]. In plain English: a believable composite is not just "cut subject, paste subject." It includes micro changes in lighting, seams, shadows, edges, and neighboring pixels.
That's why AI-assisted local editing feels stronger than brute-force masking. It's not simply replacing a person against a new background. It's reconciling the boundary conditions too.
Manual masking still matters when you need pixel-perfect production. But for headshots, product swaps, background cleanup, social assets, and ad variations, the old "draw every edge yourself" workflow is no longer the default. It's the fallback.
You preserve identity by being painfully explicit about what must not change. The best prompts define the locked attributes first, then the requested edit, then the realism constraints. Community examples around Photoshop-connected workflows consistently show that "keep my face, pose, and proportions unchanged" works far better than generic transformation prompts [3].
This is where most people mess up. They describe what they want added, but not what must stay fixed.
A weak prompt sounds like this:
Put me in a nicer background and make it look professional.
A much stronger prompt sounds like this:
Replace the background with a clean modern office. Preserve my exact face, pose, body proportions, hairstyle, skin tone, and expression. Do not change my identity or clothing fit. Match lighting direction and depth of field so the result looks like an original photo.
The difference is not style. It's constraint.
That same idea shows up in the HiFi-Inpaint paper. Their system improves results by reinforcing fine-grained detail and local fidelity, especially in masked regions where the model could otherwise hallucinate texture or branding elements [1]. Even though that paper focuses on human-product images, the lesson transfers neatly to Photoshop compositing: if you care about identity, detail, and realism, you need bounded edits and explicit preservation instructions.
If you write prompts often, this is exactly the kind of step I'd automate with Rephrase, because it turns "change my background" into a constraint-rich image prompt without making you think like a prompt engineer every time.
The identity-preserving workflow is a simple sequence: lock the subject, change one major element, refine local realism, then only open manual tools if the result still breaks. The fastest editors now use AI for the structural first pass and Photoshop craft for the final 10 to 20 percent [1][3].
Here's the workflow I'd use.
Start with the original image and decide what is sacred. Usually that means face, pose, body proportions, wardrobe silhouette, and camera perspective.
Make one structural change first. Replace the background, remove the extra object, or insert the product. Don't stack five requests in the first pass.
Run a refinement pass for realism. Ask for shadow consistency, edge cleanup, depth-of-field matching, color balance, and texture continuity.
Inspect the trouble spots. Hair, hands, logos, glasses edges, and thin objects still expose weak composites fastest.
Only then move to manual Photoshop cleanup if needed. Clone, dodge and burn, local mask cleanup, or layer-based compositing still has a place.
Here's a before-and-after prompt transformation:
| Before | After |
|---|---|
| "Change the background and make me look better." | "Replace the background with a bright studio office. Preserve my exact face, pose, skin texture, body proportions, and outfit. Improve lighting subtly without beautifying or changing identity. Match shadows and depth of field to the new setting." |
| "Add this product into the scene." | "Insert the referenced bottle into the subject's right hand. Preserve label text, cap color, reflections, and proportions. Match camera angle, hand grip, contact shadows, and scene lighting so it looks naturally photographed." |
That second example lines up closely with what reference-based inpainting research is solving: not just insertion, but detail-preserving insertion [1].
For more prompt breakdowns like this, the Rephrase blog is worth bookmarking.
You should still use manual masks when precision matters more than speed. If the job involves typography, layered brand assets, complex transparency, hair against difficult backgrounds, or forensic-level control, manual selection and layered compositing still win.
That's the catch. AI didn't make Photoshop craftsmanship irrelevant. It changed where craftsmanship starts.
If I'm building campaign assets with exact art direction, I still want layers. If I'm doing a product hero where label integrity is everything, I still inspect edges and reflections manually. And if trust matters, the PIXAR paper is a useful reminder that edits can affect more pixels than we think, which matters for both realism and authenticity checks [2].
But for day-to-day production, the new default is obvious: ask for a constrained local edit first, then clean up what's left. Not the other way around.
The real breakthrough in Photoshop compositing wasn't "better cutouts." It was moving from mask-first thinking to identity-first thinking. Once you preserve the subject and localize the change, manual masking stops being step one and becomes step five.
That's why this workflow won. It's faster, usually more believable, and much easier to repeat across batches. And if prompt structure is the bottleneck, tools like Rephrase can remove that friction in a couple of seconds.
Documentation & Research
Community Examples 3. I tested the brand new version of Photoshop in ChatGPT and it is way more useful than people realize - r/ChatGPTPromptGenius (link)
It's a workflow where you use an existing image as a visual anchor while AI edits only the selected or implied region. The goal is to preserve identity, texture, and structure instead of regenerating the whole image.
You need to tell the model exactly what must stay fixed: face, pose, proportions, expression, wardrobe, and lighting intent. Structured prompts and reference-guided edits work better than broad style prompts.