Blog / Video generation / How to Prompt AI Video Like a Cinematogr…

How to Prompt AI Video Like a Cinematographer

Learn how to use focal length, f-stop, and lighting ratios in AI video prompts for more cinematic shots and better control. See examples inside.

Ilia Ilinskii
Rephrase · April 20, 2026

Video generation7 min read

On this page

Key Takeaways Why does camera vocabulary improve AI video prompts?What does focal length mean in an AI video prompt?How should you use f-stop in AI video prompts?How do lighting ratios make AI video prompts better?How can you combine focal length, f-stop, and lighting ratios in one prompt?Before-and-after prompt examples References

Most AI video prompts fail for the same reason: they describe a scene, not a shot. If you want cinematic output, you need to stop talking like a customer and start talking like a cinematographer.

Key Takeaways

Camera vocabulary gives AI video models stronger visual constraints than vague style words.
Focal length, f-stop, and lighting ratios work best when tied to framing, subject distance, and mood.
Research on video models shows structured, camera-aware prompting improves temporal grounding and reduces vague camera descriptions [1].
Precise lighting controls work better than fuzzy terms because models respond well to explicit attributes like position, intensity, and softness [3].
A simple subject → camera → lighting structure is easier to iterate than one giant prompt blob.

Why does camera vocabulary improve AI video prompts?

Camera vocabulary improves AI video prompts because it turns abstract taste into concrete visual instructions. Research on camera-aware video understanding shows that structured prompting helps models reason more clearly about framing, lighting, and motion, instead of producing generic descriptions or muddled camera behavior [1].

Here's the thing I noticed: "cinematic" is almost useless on its own. It sounds good, but it tells the model very little. "85mm lens, f/2, 4:1 key-to-fill ratio, soft side key" tells it far more.

That lines up with current research. A 2026 paper on camera motion understanding found that VideoLLMs are weak at inferring camera behavior unless you make those cues explicit, ideally in a structured prompt [1]. Another paper on cinematic video generation showed that camera trajectory and scene control matter a lot for maintaining visual consistency across shots [2]. In plain English: when you specify the shot language, the model has less room to improvise badly.

Community examples point the same way. People who build repeatable image and video prompts tend to organize them as subject, camera, lighting, style, and constraints rather than dumping everything into one long paragraph [4][5].

What does focal length mean in an AI video prompt?

Focal length in an AI video prompt mainly acts as a composition cue. It tells the model whether you want an ultra-wide, natural, or compressed look, and it works best when paired with shot type and subject distance rather than used as a lonely number [1][2].

In real cinematography, focal length changes perspective, compression, and field of view. In AI video, it's not always physically simulated with full accuracy, but it still strongly nudges the output.

Here's a practical way to think about it:

Focal length	Usual look	Best use in prompts
16-24mm	Wide, immersive, exaggerated depth	Establishing shots, dynamic movement, interiors
35-50mm	Natural, balanced	Dialogue, medium shots, general realism
85-135mm	Compressed, intimate, portrait-like	Close-ups, fashion, emotional beats

So instead of writing "a man walking in a hallway," write the shot.

Before:

A man walking through a hotel hallway, cinematic.

After:

A tense businessman walks through a narrow hotel hallway at night, medium close-up, 85mm lens, shallow depth of field, compressed background, subtle handheld motion, soft practical lights in the corridor, moody contrast.

The second prompt gives the model a visual grammar. It also mirrors how camera-aware datasets describe cinematic techniques such as focal length, framing, and movement as separate controllable dimensions [1].

How should you use f-stop in AI video prompts?

F-stop in AI video prompts is best used as a signal for depth of field and image mood. Lower f-stops usually suggest stronger subject isolation, while higher f-stops suggest more of the scene staying sharp and readable [3][4].

This is where people get sloppy. They throw in "f/1.4" because it sounds professional, even when the scene doesn't need it.

That's a mistake.

If you want a crowded café scene with environmental storytelling, f/1.4 may fight your goal because it implies a very shallow depth of field. If you want a lonely close-up, though, it's perfect.

Here's my rule of thumb:

F-stop	Prompt effect	When to use it
f/1.4-f/2	Very shallow depth of field	Portraits, close-ups, emotional isolation
f/2.8-f/4	Moderate separation	Dialogue, lifestyle shots, product motion
f/5.6+	Deeper focus	Landscapes, wide scenes, multi-subject action

Before:

A woman stands in a flower shop.

After:

A florist pauses between bouquets in a quiet flower shop, close-up, 50mm lens, f/1.8, soft foreground petals out of focus, creamy background blur, natural window light, gentle camera drift.

The f-stop matters because it changes the model's implied rendering priorities. Papers on controllable lighting and image generation show that explicit attribute tokens outperform vague language when you want reliable manipulation of visual properties [3]. The same prompt principle carries over well into video workflows.

If you want to speed this up, tools like Rephrase can turn rough camera ideas into a more structured prompt without making you manually rewrite every shot.

How do lighting ratios make AI video prompts better?

Lighting ratios make AI video prompts better because they define contrast with real precision. Instead of saying "dramatic lighting," you can specify whether the image should feel soft, balanced, moody, or harsh by controlling the relationship between key and fill light [3].

This is one of the most underused tricks in prompting.

A 2:1 ratio feels natural and commercial. A 4:1 ratio feels more cinematic and shaped. An 8:1 ratio starts to feel noir, intense, and contrast-heavy.

The TokenLight paper is useful here even though it focuses on relighting images. It shows that models respond well to explicit lighting attributes like intensity, location, and diffuse spread rather than fuzzy descriptions [3]. That's the exact lesson prompt writers should steal.

Before:

A detective in a dark office, dramatic lighting.

After:

A detective sits alone in a dark office, medium close-up, 50mm lens, f/2.8, 8:1 lighting ratio, hard key light from camera left, minimal fill, deep shadow on the far side of the face, practical desk lamp glow in the background, slow push-in.

That prompt is stronger because the contrast is measurable. You're no longer asking for "drama." You're describing how the drama is lit.

How can you combine focal length, f-stop, and lighting ratios in one prompt?

The best way to combine focal length, f-stop, and lighting ratios is to assign each one a job. Let focal length control perspective, let f-stop control depth, and let lighting ratio control mood. When each term does one clear thing, the prompt stays readable and the shot gets sharper.

Here's a clean formula I like:

Subject and action
Environment and time of day
Shot size and focal length
F-stop and depth cues
Lighting ratio and light direction
Motion and constraints

Example:

A boxer sits on a stool between rounds in a sweaty gym, breathing hard, late evening. Tight medium shot, 85mm lens, f/2, shallow depth of field. 4:1 lighting ratio with a soft overhead key and weak fill from camera right. Slow dolly-in, background fighters blurred, gritty realistic texture, no surreal artifacts.

That structure also echoes what community prompt builders keep rediscovering: good prompts repeat the same skeleton over and over, just with better variables [4]. For more prompt workflows like this, the Rephrase blog has more articles on adapting prompts to different AI tools and output types.

Before-and-after prompt examples

A simple scene description usually gives you generic output. A camera-directed version gives you intent.

Weak prompt	Stronger camera prompt
A chef cooking in a restaurant kitchen	A chef plates a dish in a busy restaurant kitchen, 35mm lens, f/4, medium shot, 2:1 lighting ratio, bright stainless-steel highlights, handheld shoulder-height camera, fast but controlled movement
A woman looking out a window	A woman stares out a rain-covered apartment window at dawn, 85mm lens, f/1.8, close-up profile, 8:1 lighting ratio, cool window key, almost no fill, soft bokeh city lights behind her
A skateboarder in the city	A skateboarder pushes through an empty downtown street at golden hour, 24mm lens, f/5.6, low-angle tracking shot, 3:1 lighting ratio, warm sun backlight, crisp environment detail, energetic forward motion

The catch is that you still need to test and iterate. AI video models do not obey every term perfectly. But they respond much better when you give them technical anchors instead of aesthetic fog.

If you try one thing today, do this: replace one vague word in your next prompt with one measurable camera term. Swap "cinematic" for "85mm lens." Swap "dramatic light" for "8:1 lighting ratio." That one change usually tells the model more than an extra sentence ever will. And if you're doing this all day, Rephrase is a handy way to automate the cleanup step without losing your intent.

References

Documentation & Research

Geometry-Guided Camera Motion Understanding in VideoLLMs - The Prompt Report (link)
CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation - The Prompt Report (link)
TokenLight: Precise Lighting Control in Images using Attribute Tokens - The Prompt Report (link)

Community Examples

I spent 10000 hours writing AI prompts and kept repeating the same patterns… so I built a visual prompt builder (It's 100% Free) - r/PromptEngineering (link)
AI CINEMATIC SERIES - VIRTUAL CAMERA - r/PromptEngineering (link)

Frequently asked

Do AI video models really understand focal length?

They do, but not like a physical camera simulator. Focal length terms often work as style and composition cues, so pairing them with framing and subject distance usually gives better results.

Should I use lighting ratios in AI prompts?

Yes, especially when you want consistent mood and contrast. Ratios like 2:1 or 8:1 are more precise than vague phrases like 'dramatic lighting' and help anchor the scene.

Blog / Video generation / How to Prompt AI Video Like a Cinematogr…

← All notes

How to Prompt AI Video Like a Cinematographer

Learn how to use focal length, f-stop, and lighting ratios in AI video prompts for more cinematic shots and better control. See examples inside.

Ilia Ilinskii
Rephrase · April 20, 2026

Video generation7 min read

On this page

Most AI video prompts fail for the same reason: they describe a scene, not a shot. If you want cinematic output, you need to stop talking like a customer and start talking like a cinematographer.

Key Takeaways

Camera vocabulary gives AI video models stronger visual constraints than vague style words.
Focal length, f-stop, and lighting ratios work best when tied to framing, subject distance, and mood.
Research on video models shows structured, camera-aware prompting improves temporal grounding and reduces vague camera descriptions [1].
Precise lighting controls work better than fuzzy terms because models respond well to explicit attributes like position, intensity, and softness [3].
A simple subject → camera → lighting structure is easier to iterate than one giant prompt blob.

Why does camera vocabulary improve AI video prompts?

Here's the thing I noticed: "cinematic" is almost useless on its own. It sounds good, but it tells the model very little. "85mm lens, f/2, 4:1 key-to-fill ratio, soft side key" tells it far more.

What does focal length mean in an AI video prompt?

Here's a practical way to think about it:

Focal length	Usual look	Best use in prompts
16-24mm	Wide, immersive, exaggerated depth	Establishing shots, dynamic movement, interiors
35-50mm	Natural, balanced	Dialogue, medium shots, general realism
85-135mm	Compressed, intimate, portrait-like	Close-ups, fashion, emotional beats

So instead of writing "a man walking in a hallway," write the shot.

Before:

A man walking through a hotel hallway, cinematic.

After:

A tense businessman walks through a narrow hotel hallway at night, medium close-up, 85mm lens, shallow depth of field, compressed background, subtle handheld motion, soft practical lights in the corridor, moody contrast.

How should you use f-stop in AI video prompts?

This is where people get sloppy. They throw in "f/1.4" because it sounds professional, even when the scene doesn't need it.

That's a mistake.

If you want a crowded café scene with environmental storytelling, f/1.4 may fight your goal because it implies a very shallow depth of field. If you want a lonely close-up, though, it's perfect.

Here's my rule of thumb:

F-stop	Prompt effect	When to use it
f/1.4-f/2	Very shallow depth of field	Portraits, close-ups, emotional isolation
f/2.8-f/4	Moderate separation	Dialogue, lifestyle shots, product motion
f/5.6+	Deeper focus	Landscapes, wide scenes, multi-subject action

Before:

A woman stands in a flower shop.

After:

A florist pauses between bouquets in a quiet flower shop, close-up, 50mm lens, f/1.8, soft foreground petals out of focus, creamy background blur, natural window light, gentle camera drift.

If you want to speed this up, tools like Rephrase can turn rough camera ideas into a more structured prompt without making you manually rewrite every shot.

How do lighting ratios make AI video prompts better?

This is one of the most underused tricks in prompting.

A 2:1 ratio feels natural and commercial. A 4:1 ratio feels more cinematic and shaped. An 8:1 ratio starts to feel noir, intense, and contrast-heavy.

Before:

A detective in a dark office, dramatic lighting.

After:

A detective sits alone in a dark office, medium close-up, 50mm lens, f/2.8, 8:1 lighting ratio, hard key light from camera left, minimal fill, deep shadow on the far side of the face, practical desk lamp glow in the background, slow push-in.

That prompt is stronger because the contrast is measurable. You're no longer asking for "drama." You're describing how the drama is lit.

How can you combine focal length, f-stop, and lighting ratios in one prompt?

Here's a clean formula I like:

Subject and action
Environment and time of day
Shot size and focal length
F-stop and depth cues
Lighting ratio and light direction
Motion and constraints

Example:

A boxer sits on a stool between rounds in a sweaty gym, breathing hard, late evening. Tight medium shot, 85mm lens, f/2, shallow depth of field. 4:1 lighting ratio with a soft overhead key and weak fill from camera right. Slow dolly-in, background fighters blurred, gritty realistic texture, no surreal artifacts.

Before-and-after prompt examples

A simple scene description usually gives you generic output. A camera-directed version gives you intent.

Weak prompt	Stronger camera prompt
A chef cooking in a restaurant kitchen	A chef plates a dish in a busy restaurant kitchen, 35mm lens, f/4, medium shot, 2:1 lighting ratio, bright stainless-steel highlights, handheld shoulder-height camera, fast but controlled movement
A woman looking out a window	A woman stares out a rain-covered apartment window at dawn, 85mm lens, f/1.8, close-up profile, 8:1 lighting ratio, cool window key, almost no fill, soft bokeh city lights behind her
A skateboarder in the city	A skateboarder pushes through an empty downtown street at golden hour, 24mm lens, f/5.6, low-angle tracking shot, 3:1 lighting ratio, warm sun backlight, crisp environment detail, energetic forward motion

The catch is that you still need to test and iterate. AI video models do not obey every term perfectly. But they respond much better when you give them technical anchors instead of aesthetic fog.

References

Documentation & Research

Geometry-Guided Camera Motion Understanding in VideoLLMs - The Prompt Report (link)
CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation - The Prompt Report (link)
TokenLight: Precise Lighting Control in Images using Attribute Tokens - The Prompt Report (link)

Community Examples

I spent 10000 hours writing AI prompts and kept repeating the same patterns… so I built a visual prompt builder (It's 100% Free) - r/PromptEngineering (link)
AI CINEMATIC SERIES - VIRTUAL CAMERA - r/PromptEngineering (link)

Frequently asked

Do AI video models really understand focal length?

They do, but not like a physical camera simulator. Focal length terms often work as style and composition cues, so pairing them with framing and subject distance usually gives better results.

Should I use lighting ratios in AI prompts?

Yes, especially when you want consistent mood and contrast. Ratios like 2:1 or 8:1 are more precise than vague phrases like 'dramatic lighting' and help anchor the scene.