Learn how to use focal length, f-stop, and lighting ratios in AI video prompts for more cinematic shots and better control. See examples inside.
Most AI video prompts fail for the same reason: they describe a scene, not a shot. If you want cinematic output, you need to stop talking like a customer and start talking like a cinematographer.
Camera vocabulary improves AI video prompts because it turns abstract taste into concrete visual instructions. Research on camera-aware video understanding shows that structured prompting helps models reason more clearly about framing, lighting, and motion, instead of producing generic descriptions or muddled camera behavior [1].
Here's the thing I noticed: "cinematic" is almost useless on its own. It sounds good, but it tells the model very little. "85mm lens, f/2, 4:1 key-to-fill ratio, soft side key" tells it far more.
That lines up with current research. A 2026 paper on camera motion understanding found that VideoLLMs are weak at inferring camera behavior unless you make those cues explicit, ideally in a structured prompt [1]. Another paper on cinematic video generation showed that camera trajectory and scene control matter a lot for maintaining visual consistency across shots [2]. In plain English: when you specify the shot language, the model has less room to improvise badly.
Community examples point the same way. People who build repeatable image and video prompts tend to organize them as subject, camera, lighting, style, and constraints rather than dumping everything into one long paragraph [4][5].
Focal length in an AI video prompt mainly acts as a composition cue. It tells the model whether you want an ultra-wide, natural, or compressed look, and it works best when paired with shot type and subject distance rather than used as a lonely number [1][2].
In real cinematography, focal length changes perspective, compression, and field of view. In AI video, it's not always physically simulated with full accuracy, but it still strongly nudges the output.
Here's a practical way to think about it:
| Focal length | Usual look | Best use in prompts |
|---|---|---|
| 16-24mm | Wide, immersive, exaggerated depth | Establishing shots, dynamic movement, interiors |
| 35-50mm | Natural, balanced | Dialogue, medium shots, general realism |
| 85-135mm | Compressed, intimate, portrait-like | Close-ups, fashion, emotional beats |
So instead of writing "a man walking in a hallway," write the shot.
Before:
A man walking through a hotel hallway, cinematic.
After:
A tense businessman walks through a narrow hotel hallway at night, medium close-up, 85mm lens, shallow depth of field, compressed background, subtle handheld motion, soft practical lights in the corridor, moody contrast.
The second prompt gives the model a visual grammar. It also mirrors how camera-aware datasets describe cinematic techniques such as focal length, framing, and movement as separate controllable dimensions [1].
F-stop in AI video prompts is best used as a signal for depth of field and image mood. Lower f-stops usually suggest stronger subject isolation, while higher f-stops suggest more of the scene staying sharp and readable [3][4].
This is where people get sloppy. They throw in "f/1.4" because it sounds professional, even when the scene doesn't need it.
That's a mistake.
If you want a crowded café scene with environmental storytelling, f/1.4 may fight your goal because it implies a very shallow depth of field. If you want a lonely close-up, though, it's perfect.
Here's my rule of thumb:
| F-stop | Prompt effect | When to use it |
|---|---|---|
| f/1.4-f/2 | Very shallow depth of field | Portraits, close-ups, emotional isolation |
| f/2.8-f/4 | Moderate separation | Dialogue, lifestyle shots, product motion |
| f/5.6+ | Deeper focus | Landscapes, wide scenes, multi-subject action |
Before:
A woman stands in a flower shop.
After:
A florist pauses between bouquets in a quiet flower shop, close-up, 50mm lens, f/1.8, soft foreground petals out of focus, creamy background blur, natural window light, gentle camera drift.
The f-stop matters because it changes the model's implied rendering priorities. Papers on controllable lighting and image generation show that explicit attribute tokens outperform vague language when you want reliable manipulation of visual properties [3]. The same prompt principle carries over well into video workflows.
If you want to speed this up, tools like Rephrase can turn rough camera ideas into a more structured prompt without making you manually rewrite every shot.
Lighting ratios make AI video prompts better because they define contrast with real precision. Instead of saying "dramatic lighting," you can specify whether the image should feel soft, balanced, moody, or harsh by controlling the relationship between key and fill light [3].
This is one of the most underused tricks in prompting.
A 2:1 ratio feels natural and commercial. A 4:1 ratio feels more cinematic and shaped. An 8:1 ratio starts to feel noir, intense, and contrast-heavy.
The TokenLight paper is useful here even though it focuses on relighting images. It shows that models respond well to explicit lighting attributes like intensity, location, and diffuse spread rather than fuzzy descriptions [3]. That's the exact lesson prompt writers should steal.
Before:
A detective in a dark office, dramatic lighting.
After:
A detective sits alone in a dark office, medium close-up, 50mm lens, f/2.8, 8:1 lighting ratio, hard key light from camera left, minimal fill, deep shadow on the far side of the face, practical desk lamp glow in the background, slow push-in.
That prompt is stronger because the contrast is measurable. You're no longer asking for "drama." You're describing how the drama is lit.
The best way to combine focal length, f-stop, and lighting ratios is to assign each one a job. Let focal length control perspective, let f-stop control depth, and let lighting ratio control mood. When each term does one clear thing, the prompt stays readable and the shot gets sharper.
Here's a clean formula I like:
Example:
A boxer sits on a stool between rounds in a sweaty gym, breathing hard, late evening. Tight medium shot, 85mm lens, f/2, shallow depth of field. 4:1 lighting ratio with a soft overhead key and weak fill from camera right. Slow dolly-in, background fighters blurred, gritty realistic texture, no surreal artifacts.
That structure also echoes what community prompt builders keep rediscovering: good prompts repeat the same skeleton over and over, just with better variables [4]. For more prompt workflows like this, the Rephrase blog has more articles on adapting prompts to different AI tools and output types.
A simple scene description usually gives you generic output. A camera-directed version gives you intent.
| Weak prompt | Stronger camera prompt |
|---|---|
| A chef cooking in a restaurant kitchen | A chef plates a dish in a busy restaurant kitchen, 35mm lens, f/4, medium shot, 2:1 lighting ratio, bright stainless-steel highlights, handheld shoulder-height camera, fast but controlled movement |
| A woman looking out a window | A woman stares out a rain-covered apartment window at dawn, 85mm lens, f/1.8, close-up profile, 8:1 lighting ratio, cool window key, almost no fill, soft bokeh city lights behind her |
| A skateboarder in the city | A skateboarder pushes through an empty downtown street at golden hour, 24mm lens, f/5.6, low-angle tracking shot, 3:1 lighting ratio, warm sun backlight, crisp environment detail, energetic forward motion |
The catch is that you still need to test and iterate. AI video models do not obey every term perfectly. But they respond much better when you give them technical anchors instead of aesthetic fog.
If you try one thing today, do this: replace one vague word in your next prompt with one measurable camera term. Swap "cinematic" for "85mm lens." Swap "dramatic light" for "8:1 lighting ratio." That one change usually tells the model more than an extra sentence ever will. And if you're doing this all day, Rephrase is a handy way to automate the cleanup step without losing your intent.
Documentation & Research
Community Examples
They do, but not like a physical camera simulator. Focal length terms often work as style and composition cues, so pairing them with framing and subject distance usually gives better results.
Yes, especially when you want consistent mood and contrast. Ratios like 2:1 or 8:1 are more precise than vague phrases like 'dramatic lighting' and help anchor the scene.