How to Write Prompts for Veo 3: A Developer's Playbook for Getting the Shot You Actually Want
Veo 3 prompting isn't poetry-it's spec-writing. Here's how I structure prompts to control subject, camera, motion, and style reliably.
-0075.png&w=3840&q=75)
If you've tried Veo-style video generation with "a cinematic shot of X" and got something that's technically pretty but creatively wrong, you've already learned the key lesson: video prompts are less like "describe a vibe" and more like "write a tight production brief".
The catch is that video adds two failure modes that image prompting can often dodge. First, time: the model has to stay consistent across frames. Second, camera: even if the subject looks right, the shot can feel off because the implied lens, movement, and blocking aren't what you pictured.
So when I say "how to write prompts for Veo 3," what I really mean is: how to write prompts that reduce ambiguity, lock the camera, and leave the model fewer excuses to improvise.
I'm going to ground this in two Tier 1 research directions. One is prompt evaluation: treat prompts as things you iterate and score, not one-off magic spells [1]. The other is controllability: modern video generation work keeps showing that "camera intent" is a real control axis that benefits from explicit specification and feedback loops [2]. Veo 3's UI/API details vary depending on where you're running it, but these principles travel well.
The mental model: your prompt is a spec, not a sentence
Here's what I noticed after watching teams prompt video models in the real world: the best prompts read like a short film shot list collapsed into one paragraph.
A "spec prompt" tends to include four layers, in this order:
You start with what happens (subject + action), then define where and when (scene + time + atmosphere), then declare how it's filmed (camera + lens + movement + pacing), then lock the look (lighting + style + medium + post).
That order matters because it reduces the model's temptation to invent story logic. In prompt evaluation research, a recurring theme is that prompts improve when they consistently provide the right context and constraints for the target output [1]. Video just raises the stakes.
If you only remember one thing: don't ask for "cinematic." Specify what "cinematic" means in camera language.
The Veo 3 prompt skeleton I actually use
I like writing prompts in one dense paragraph, but mentally I'm filling a template. This is the template:
SUBJECT + ACTION:
[Who/what], [what they do], [emotion/intent], [timing]
SCENE:
Location, era, weather, props, background activity, key colors
CAMERA:
Shot type, lens feel, camera height/angle, camera move, focus behavior, framing rules
MOTION:
Subject motion + environment motion + "what stays stable"
LIGHT + STYLE:
Lighting direction/quality, contrast, texture, film stock/grade, realism vs stylized
CONSTRAINTS (negatives):
No text, no logos, no extra limbs/fingers/faces, no sudden cuts, no warping, no camera shake
Why call out "what stays stable"? Because the model will happily trade off identity consistency, geometry, and lighting continuity if you don't tell it what cannot change. Research on controllable video diffusion repeatedly points at camera alignment and stability as a key challenge; "camera intent" that isn't well-posed tends to yield drift, blur, or weird geometry artifacts [2]. Your prompt can't fix the model, but it can make your intent easier to satisfy.
Camera language: the highest leverage words in your prompt
Most bad video prompts fail because the camera is underspecified. You'll get the right subject, but the shot feels like stock footage.
When I want control, I specify four camera things explicitly:
First is shot size. "Extreme wide", "wide", "medium", "close-up", "macro". If you skip this, the model chooses for you.
Second is lens feel. You don't need to be a DP. Just pick a vibe: "24mm wide-angle", "50mm natural", "85mm portrait compression". The goal is to control distortion and depth cues.
Third is movement. "Locked-off tripod", "slow dolly-in", "crane down", "handheld (subtle)", "gimbal follow from behind". Pick one. Don't list five movements and hope.
Fourth is focus behavior. If you want that premium look, say "shallow depth of field, subject stays in focus, background bokeh" or "deep focus, everything sharp".
This is also where iterative testing pays off. One reason I like the evaluation framing from prompt research is it legitimizes the boring part: you run prompt variants like experiments, compare outputs, and keep what wins [1]. For video, I'll often A/B only the camera line first, because it changes the perceived quality more than adding another adjective to the scene.
Motion: specify dynamics, not just appearance
Video prompts need verbs, but they also need rules.
If you say "a dog running," you'll get running. But you might also get morphing, teleporting paws, or a background that melts. So I add stabilizers:
I'll say things like "the dog's markings remain consistent across frames" or "the storefront signage remains legible but contains no readable text" (or just "no readable text" if you're strict). I'll also explicitly request temporal continuity: "single continuous shot, no cuts."
This lines up with what controllability research keeps reinforcing: the model needs a well-defined target for alignment, and camera-motion mismatch is a known source of artifacts [2]. Again, you're not rewriting the model. You're reducing degrees of freedom.
Practical examples (prompts you can steal)
I'm going to write these as if you're prompting Veo 3 directly. Tweak to your interface.
Example 1: Product hero shot that doesn't feel like an ad generator
A close-up hero shot of a matte-black wireless earbud case slowly rotating on a dark stone surface, a few water droplets sliding across the stone. Modern studio environment, minimal props, cool gray palette, subtle fog in the background.
Camera: locked-off tripod, 85mm lens look, slow controlled turntable motion only, shallow depth of field, the logo area remains blank (no text). Soft key light from camera-left with a thin rim light from behind, high micro-contrast, realistic reflections, premium commercial look.
Constraints: single continuous shot, no camera shake, no extra objects appearing, no text, no brand marks, no flicker.
Example 2: Cinematic street scene with controlled movement
A lone cyclist rides through a rainy neon-lit street at night, passing puddles that reflect pink and cyan signage. Urban downtown alley, wet asphalt, steam from vents, distant pedestrians as silhouettes.
Camera: medium-wide shot, 35mm lens look, low camera height, slow side-tracking dolly matching the cyclist speed, smooth motion, subject stays centered, background parallax visible. Rain falls steadily, puddle ripples consistent, neon reflections stable.
Lighting: strong neon practicals, soft fill from storefronts, cinematic contrast, slight film grain, realistic.
Constraints: no cuts, no sudden zooms, no readable text, no face warping.
Example 3: The "I don't know what I want yet" prompt (useful early)
This is where community practice is actually helpful: people learn prompting faster by collecting full conversations and iterations rather than copying a single "perfect prompt" [3]. I agree with that instinct, especially for video where your first draft is rarely right.
Here's the meta-prompt I use to make the model help me write the real prompt:
Help me write a Veo 3 prompt for a 6-8 second video. Ask me up to 8 clarifying questions, but only about: subject, action, location, time of day, camera shot size, camera movement, visual style, and any "must not change" constraints.
After I answer, output one final prompt in a single paragraph plus a short "constraints" line.
Closing thought: treat prompting like pre-production
The most reliable Veo 3 prompts feel like something you'd hand to a small crew: clear action, clear shot, clear look, clear constraints.
If you want to get good fast, steal a page from prompt evaluation research and run your prompting like an experiment. Change one variable per iteration, keep a small "prompt changelog," and decide what "good" means before you generate (camera correctness, identity stability, motion continuity, whatever) [1]. That's how you stop "prompting" and start directing.
References
References
Documentation & Research
- LLM Prompt Evaluation for Educational Applications - arXiv - http://arxiv.org/abs/2601.16134v1
- CamPilot: Improving Camera Control in Video Diffusion Model with Efficient Camera Reward Feedback - arXiv - http://arxiv.org/abs/2601.16214v1
Community Examples
3. How do you study good AI conversations? - r/PromptEngineering - https://www.reddit.com/r/PromptEngineering/comments/1qp7get/how_do_you_study_good_ai_conversations/
Related Articles
-0124.png&w=3840&q=75)
Perplexity AI: How to Write Search Prompts That Actually Pull the Right Sources
A practical way to prompt Perplexity like a research assistant: tighter questions, better constraints, and built-in verification loops.
-0123.png&w=3840&q=75)
How to Write Prompts for Grok (xAI): A Practical Playbook for Getting Crisp, Grounded Answers
A developer-friendly guide to prompting Grok: structure, constraints, iterative refinement, and how to test prompts like a product.
-0122.png&w=3840&q=75)
Best Prompts for Llama Models: Reliable Templates for Llama 3.x Instruct (and Local Runtimes)
Prompt patterns that consistently work on Llama Instruct models: formatting, role priming, structured outputs, and safety-aware prompting.
-0121.png&w=3840&q=75)
GPT-5.2 Prompts vs Claude 4.6 Prompts: What Actually Changes (and What Doesn't)
A practical, prompt-engineering comparison between GPT-5.2 and Claude 4.6: where wording matters, where it doesn't, and how to write prompts that transfer.
