Blog / Image generation / Why Image Provenance Still Isn't Solved

Why Image Provenance Still Isn't Solved

Learn how SynthID and C2PA made image provenance default, where they work, and what gaps remain in AI watermarking. See examples inside.

Ilia Ilinskii
Rephrase · May 1, 2026

Image generation8 min read

On this page

Key Takeaways What made SynthID and C2PA the default?How do SynthID and C2PA differ in practice?Why did provenance beat pure AI detection?What's still missing from image provenance?What should builders do right now?References

If you generate images often, you've probably noticed a quiet shift: provenance is no longer a niche compliance feature. It's becoming table stakes.

Key Takeaways

SynthID and C2PA solve different problems: one hides a signal in the content, the other signs the content's history as metadata.
Provenance became the default because platforms stopped betting on detection alone and moved toward origin verification instead [1][2].
The current stack is still fragile when metadata is stripped, watermarks are attacked, or non-participating generators are involved [1][3].
The missing layer is interoperability plus persistence across edits, exports, screenshots, and platform handoffs.
For teams building AI workflows, provenance should be treated as architecture, not an afterthought [3].

What made SynthID and C2PA the default?

SynthID and C2PA became default because the industry realized provenance is more practical than trying to classify every image after the fact. Detection models drift, false positives are costly, and platforms need signals they can verify at creation time, not guesses made later from pixels alone [1][2].

Here's the big picture. C2PA gives you signed metadata: who created the asset, what tool touched it, and what edits happened along the way. SynthID goes deeper and embeds a watermark into the generated media itself, aiming to survive common transformations [1]. One is a provenance envelope. The other is a content-level signal.

That combination matters. Metadata is clean and inspectable. Watermarking is harder to strip accidentally. Put them together and you get a more believable chain of custody than either system offers alone.

What I find interesting is that this happened because the old dream of universal AI detection kept running into reality. A 2026 provenance paper makes the point clearly: watermarking, provenance frameworks, and registry-based verification are complementary because no single method is complete on its own [1]. That feels right. The market moved from "find all AI images" to "verify the ones that opted in."

How do SynthID and C2PA differ in practice?

In practice, C2PA tells you the declared history of an asset, while SynthID tries to prove something about the pixels themselves. That means C2PA is easier to audit, but easier to lose; SynthID is more persistent, but harder to inspect and not invincible [1][3].

A quick comparison helps:

Approach	What it adds	Strength	Weakness
C2PA-style provenance	Signed metadata and edit history	Human-readable, machine-verifiable chain of origin	Can be stripped or lost during platform hops
SynthID-style watermarking	Imperceptible signal in content	Can survive compression, resizing, and some edits	Detection can weaken under stronger transformations or attacks
Registry-based verification	External fingerprint lookup	Auditable and platform-level verification	Only works for registered content and cooperative ecosystems

This is why I don't buy the "metadata vs watermarking" framing. It's not either-or. It's layered defense.

If you work with image models, this maps nicely to prompting workflows too. You can ask for a photorealistic asset, but the operational question starts after generation: how will that image travel, be reused, and be verified later? That's exactly the kind of production detail teams forget until legal or trust issues show up. We cover adjacent workflow ideas in the Rephrase blog, because the prompt is only the first half of the job.

Why did provenance beat pure AI detection?

Provenance beat pure detection because verification is more stable than classification. A signed claim or embedded mark is brittle in some ways, but it does not depend on constantly retraining a model to guess whether an image "looks AI-generated" [1][2].

The research backs this up. The registry-based provenance paper argues that detector-only approaches struggle with generalization, dataset bias, and adversarial adaptation at scale [1]. That is a polite academic way of saying: detectors age badly.

At the same time, watermarking research keeps showing the same tradeoff. Watermarks are useful, but they are attack targets. A 2026 paper on robust content watermarking for text-to-image systems explicitly starts from that premise: current watermarking techniques are vulnerable to removal and forgery attacks, so robustness has to be designed in from the start [2].

That's the deeper reason provenance became default. It's not that SynthID or C2PA are perfect. It's that the alternatives are worse when you need something deployable now.

What's still missing from image provenance?

What's still missing is persistence across the messy real internet: screenshots, exports, edits, reposts, mixed human-AI workflows, and non-cooperative tools. Today's systems work best when the generator, platform, and verifier all agree to participate [1][3].

This is the catch most marketing pages skate past. The papers do not.

The provenance registry paper says SynthID and C2PA both depend on generator-side adoption and become ineffective when watermarks or metadata are absent or intentionally removed [1]. The EU AI Act analysis makes a similar point from a policy angle: transparency breaks down when content moves across platforms, when human and AI outputs are interleaved, and when there is no shared interoperable format [3].

That last point is huge. Interoperability sounds boring until you realize it decides whether provenance survives normal usage. An image that loses credentials when uploaded to a social tool or captured as a screenshot is not rare edge-case behavior. That is the default internet behavior.

A before-and-after framing makes the gap obvious:

Scenario	What works now	What still breaks
Image stays in original file chain	C2PA metadata + watermarking can help	Verification still depends on tool support
Image gets lightly edited or compressed	Watermarking may survive	Confidence can drop
Image is screenshot or re-exported	Metadata often disappears	Provenance chain breaks
Image comes from a non-participating generator	Registry and provenance may fail	No universal proof of origin
Image is attacked intentionally	Some methods resist common changes	Advanced removal or forgery remains a problem [2]

So yes, provenance became default. But "default" is not the same as "solved."

What should builders do right now?

Builders should assume provenance is a layered systems problem, not a single feature they can toggle on. The safest approach today is to combine metadata, watermarking, and platform-aware verification paths instead of betting everything on one signal [1][2][3].

If I were shipping an image product in 2026, I'd do three things. First, emit signed provenance wherever possible. Second, embed a content-level mark when the generation stack supports it. Third, treat loss of provenance as normal and design fallbacks, not exceptions.

This is also where tooling discipline matters. Teams move fast, prompts get copied across apps, assets move through Slack, Figma, browsers, and editors. A small process improvement can save a lot of chaos. Tools like Rephrase are useful on the prompt side because they make it easier to standardize generation requests across apps, but you still need a provenance plan after the image exists.

My take is simple: the industry won the argument that provenance matters. It has not yet won the harder argument that provenance can survive the internet unchanged.

If you want one sentence to remember, it's this: SynthID and C2PA made provenance normal, but not durable enough yet.

That's still progress. It just isn't the end of the story.

References

Documentation & Research

Provenance Verification of AI-Generated Images via a Perceptual Hash Registry Anchored on Blockchain - The Prompt Report (link)
On Google's SynthID-Text LLM Watermarking System: Theoretical Analysis and Empirical Validation - arXiv cs.AI (link)
Transparency as Architecture: Structural Compliance Gaps in EU AI Act Article 50 II - arXiv cs.AI (link)

Community Examples 4. My journey through Reverse Engineering SynthID - r/LocalLLaMA (link)

Frequently asked

What is the difference between SynthID and C2PA?

SynthID embeds an imperceptible watermark into generated media itself, while C2PA packages provenance as signed metadata about origin and edits. In practice, they solve different parts of the same trust problem.

Can SynthID watermarks survive edits?

They are designed to survive common transformations like compression, resizing, and minor edits, but not unlimited manipulation. Research and practical reports both suggest stronger edits or adversarial workflows can weaken detectability.