Figma-to-code is having its "printing press" moment. Not because models suddenly understand design. The real shift is that we're finally learning to specify design intent in a way code generators can actually hold onto across iterations.
Here's the thing: most teams are still prompting like it's 2023. They paste a screenshot, say "build this in React," and then act surprised when the output is a beautiful lie. Wrong tokens. Missing states. No accessibility. And after two edits, the UI drifts into a different product entirely.
The fix isn't "better models." It's better intermediate representations and better prompts.
A recent CHI paper nailed what's going on: UI prompting works best when you treat intent as hierarchical semantics-Product, Design System, Feature, Component-because those layers are interdependent and changes cascade [1]. That's basically the blueprint for a robust Figma-to-code workflow: don't just ask for code; ask the model to first understand what the design means, then generate, then analyze what it generated, then refine without drift.
And when you want this to work in real repositories-not just a single component demo-you need a verification loop. Agentic coding research is converging on the same idea: plan first, implement second, test continuously, and treat validation as a first-class step, not an afterthought [2]. UI generation research adds one more critical twist: feed corrections back in ways designers naturally work (comments, sketches, revisions) because that data is higher quality than simple thumbs-up/down ranking [3].
Let's turn those ideas into a prompt playbook you can use today.
The mental model: prompts as "semantic handoff," not requests
If you take only one idea from this article, make it this: your prompt is not a wish. It's a handoff artifact.
The CHI "semantic guidance" work shows why vague prompting collapses: UI intent is multi-layered, and models can't reliably infer missing constraints from linear text. Their framework explicitly separates product context (who/why), design system constraints (style, color, type, spacing), feature requirements (function/content/IA), and component-level details (type, states, interactivity, properties) [1].
So in a Figma-to-code workflow, your job is to translate the Figma file into that hierarchy. Not perfectly. Just enough that the model stops guessing.
This also explains why "pixel-perfect" prompting is a trap. Figma is often over-specified visually and under-specified behaviorally. Code needs the opposite: clear tokens, states, and interaction rules.
A solid prompt therefore does three things:
It pins the design system (tokens, typography, spacing scale), it defines component contracts (props, variants, states), and it defines acceptance checks (what must match, what can differ, and how to verify).
Prompt pattern 1: the Design Spec Extractor (turn Figma into semantics)
Start by forcing structure. Even if you're working from an exported JSON, a Dev Mode spec, or just screenshots, ask the model to produce a semantic spec before writing code. This mirrors the "structured specification" idea from the semantic guidance system: get intent into slots so you can edit it later without rewriting everything [1].
Use a prompt like this:
You are a senior UI engineer and design-systems specialist.
Goal: Convert the following Figma screen into a structured UI spec that can be used for code generation.
Input you have:
- Screen purpose: <one sentence>
- Target platform: web (React)
- Design notes (optional): <notes>
- Figma details (paste): <layers/tokens/measurements OR textual description OR screenshot summary>
Output format (strict):
1) Product
- Description
- Target user
- Goal
2) Design System
- Color tokens (name -> hex -> usage)
- Typography scale (token -> font/size/weight/line-height)
- Spacing scale (token -> px)
- Radius, shadow, border tokens
- Icon set assumptions
3) Feature (this screen)
- Function
- Content model (entities + fields)
- Information architecture (sections + priority)
4) Components inventory
For each component:
- Name
- Responsibility
- Props contract (TypeScript)
- Variants
- States (loading/empty/error/disabled/focus)
- Accessibility requirements (aria, keyboard)
5) Open questions / ambiguities (max 10)
6) Generation constraints
- MUST use tokens above (no arbitrary values)
- MUST be responsive (define breakpoints)
- MUST support dark mode? (state assumption)
Why this works: you're building the same intermediate layer the CHI paper argues for-an inspectable semantic representation that reduces both the gulf of execution ("what do I say?") and evaluation ("did it implement my intent?") [1]. And once you have this spec, you can diff it and keep iterations scoped instead of letting changes ripple randomly.
Prompt pattern 2: Component Generation that doesn't rot after the second edit
Most component codegen fails because the model improvises. It picks random spacing, "close enough" colors, and then future edits break consistency.
So your component prompt should explicitly anchor to the spec and force the model to produce contracts, not just markup. I also like to demand that it generate variants and states in the first pass, because the CHI framework calls out state and interactivity as core semantics that are often missing from static taxonomies [1].
Using the UI spec below, generate ONE React component: <ComponentName>.
Requirements:
- TypeScript + React
- Styling: Tailwind (use ONLY the tokens defined in the spec)
- Exported API: component + props type + variant types
- Implement variants and states exactly as specified
- Include accessibility: semantic HTML + aria + keyboard interactions
- No mock data inside the component
Deliverables:
1) Component code
2) Short "contract notes" explaining how to use it (max 8 lines)
3) A minimal usage example
UI spec:
<paste the structured spec sections for Design System + the target component>
If you're doing this across a repo, the big win is consistency: tokens become your guardrails, and props become the handoff interface between design and engineering.
Prompt pattern 3: Handoff as verification (your spec is now a test)
Agentic coding research is blunt about it: you don't get reliable outcomes without a development-oriented testing loop [2]. UI teams need the same discipline, just expressed in UI terms.
So after generation, run a "semantic audit" prompt. The semantic guidance system literally has an Analyze phase that extracts what the model implemented and compares it to intended semantics [1]. You can mimic that with a simple diff instruction.
You are auditing generated UI code against an intended UI spec.
Inputs:
- Intended UI spec (source of truth)
- Generated code (React + Tailwind)
Task:
1) Extract the implemented semantics from the code:
- tokens used (colors/type/spacing/radius)
- component structure
- states/variants
- accessibility behaviors
2) Produce a spec-to-code mismatch report with three categories:
- Violations (must fix)
- Drifts (changed but maybe acceptable)
- Omissions (missing states/behaviors/content)
3) Provide a targeted patch plan: smallest set of edits to reach compliance.
Output: markdown table + patch plan.
This is where teams usually discover the real failures: hover states not implemented, focus rings missing, hit targets too small, "disabled" just changes opacity but still clickable, empty state not designed, and so on.
And if you want the model to actually learn your taste over time, UI generation research suggests the best corrections aren't binary rankings-they're designer-native feedback: comments, sketches, direct revisions [3]. In practice that means you should capture change requests as structured critiques ("Increase contrast of secondary text," "Typography hierarchy too flat," "CTA lacks prominence") and feed those back as revision prompts.
Practical examples: how people make this stick in workflows
On the community side, a pattern I keep seeing is: stop prompting "build X," and start prompting "spec X, then build." One Reddit thread put it nicely: the project goes off the rails when you let the model infer requirements while it codes; forcing a short product spec first makes plans coherent and reduces random feature invention [4]. That maps cleanly onto the Product/Feature layers in the semantic framework [1].
So if you're doing Figma handoff, try this lightweight "spec-first" wrapper before every codegen run:
Before writing or modifying code:
Write a short spec for this change.
- Who is it for?
- What problem does it solve?
- What is explicitly out of scope?
Then list the exact UI semantics affected (tokens, components, states).
Only then propose the patch.
It's simple, but it prevents the most common failure mode: "I asked to change the header spacing and it redesigned the entire page."
Closing thought: the best Figma-to-code prompt is a reusable artifact
The future of Figma-to-code isn't one perfect prompt. It's a small system of prompts that behave like a pipeline: extract semantics, generate components, audit against spec, iterate with scoped diffs.
Once you do that, handoff stops being a meeting. It becomes a repeatable process. And that's when you actually start shipping faster, instead of just generating prettier prototypes.
References
Documentation & Research
- Bridging Gulfs in UI Generation through Semantic Guidance - arXiv cs.AI (CHI '26) https://arxiv.org/abs/2601.19171
- FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation - arXiv cs.SE https://arxiv.org/abs/2602.03798v1
- Improving User Interface Generation Models from Designer Feedback - arXiv cs.LG (CHI '26) https://arxiv.org/abs/2509.16779
- OpenAI Codex and Figma launch seamless code-to-design experience - OpenAI Blog https://openai.com/index/figma-partnership
Community Examples
- Many LLM coding failures come from letting the model infer requirements while building - r/ChatGPTPromptGenius https://www.reddit.com/r/ChatGPTPromptGenius/comments/1qsemdy/many_llm_coding_failures_come_from_letting_the/
-0189.png&w=3840&q=75)

-0204.png&w=3840&q=75)
-0202.png&w=3840&q=75)
-0197.png&w=3840&q=75)
-0196.png&w=3840&q=75)