YouTube creators don't really have a "content problem." We have a decision problem.
You can come up with 50 title ideas, 3 script angles, 10 thumbnail directions, and 6 description drafts in an hour with an LLM… and still publish nothing because it all feels same-y. Or worse: it looks optimized, but it's not you. It's not the video you'd actually record.
Here's the thing I've noticed: most "YouTube prompts" fail because they ask for outputs (titles, scripts, thumbnails) without building the stepping stones the model needs to get there. When you do that, the model defaults to internet-average creator voice and internet-average structure.
That's not a YouTube strategy. That's autocomplete.
The fix is to prompt like a producer. You give the model constraints, context, and intermediate checkpoints. You generate options, pick one, and only then expand. This is basically the same idea researchers use when they add intermediate "stepping stone" questions to help models solve harder tasks: you don't jump straight to the final answer if you want reliability [2].
And you also want to avoid a sneaky failure mode: if you feed the model a bad draft title or a cringey hook and ask it to "improve it," it may get anchored to the wrong shape and keep repeating the same mistake. That anchoring effect has a name in current LLM research-contextual drag-and it can persist even when you explicitly tell the model the draft is wrong [3]. For creators, that shows up as "why do all my revised titles still sound like the original bad title?"
So let's build a workflow that (1) generates better raw material and (2) doesn't trap you in mediocre iterations.
The creator workflow that actually works: brief → options → selection → expansion → polish
I treat YouTube prompting as a pipeline with two rules.
First: separate divergence from convergence. Divergence is where you want lots of different ideas (angles, hooks, thumbnail concepts). Convergence is where you lock one in and execute.
Second: don't iterate on bad context. If your draft hook sucks, don't ask the model to "make it better" while keeping the same framing. Ask it to propose alternative frames first. That's how you sidestep contextual drag [3].
If you want one mental model, borrow this from the ARQ "stepping stones" framing: create smaller subproblems that are easier to solve and easier to judge, then use them as context for the final artifact [2]. In creator terms: pick the promise, pick the audience tension, pick the structure, then write.
Prompt pack: titles that match the actual video promise
A title isn't "copy." It's the contract you're making with the viewer. The biggest upgrade you can get from an LLM is not cleverness. It's precision: what is this video really delivering, and for whom?
Use this prompt to generate titles from a clear promise instead of a vague topic.
You are my YouTube title strategist.
Video idea (raw): {paste your rough idea}
Channel context:
- Niche: {niche}
- Audience: {who they are + what they already know}
- My tone: {dry / energetic / skeptical / friendly / etc.}
- What I refuse to do: no fake urgency, no "SHOCKING", no lying
Stepping stones:
1) Restate the video's promise in ONE sentence (no fluff).
2) List 5 audience "itches" this video scratches (specific frustrations).
3) Propose 12 titles grouped into 4 buckets (3 titles each):
A) "Result-first"
B) "Mistake to avoid"
C) "Contrarian take"
D) "Curiosity gap"
Rules:
- Max 60 characters if possible
- No generic words like "ultimate", "insane", "game changer"
- Each title must imply a distinct angle, not just synonyms
Return in a table: bucket | title | implied promise | who it's for
Why the stepping stones matter: you're forcing the model to do the work a human producer does-clarify promise, clarify audience tension-before it writes titles. That's the same "ask better intermediate questions" pattern that improves downstream quality in research settings [2].
Prompt pack: long-form scripts that don't feel generic
Most AI-written scripts feel generic because the model was never told what "good" means for spoken YouTube. You need pacing, segment intent, and an opinionated throughline.
Also: don't ask for a full script immediately. Ask for a beat sheet first. If the beat sheet is weak, the script will be weak.
Act as a YouTube showrunner.
Goal: write a {8-12} minute script that is tight, spoken, and structured.
Inputs:
- Topic: {topic}
- Viewer avatar: {who they are}
- Viewer's starting belief: {what they think now}
- Target belief after video: {what I want them to believe/do}
- My stance: {my real opinion in 1-2 lines}
- Proof assets I have: {personal story, demo, data, screenshots, etc.}
Step 1 (outline only):
Create a beat sheet with timestamps (rough is fine) including:
- Hook (0:00-0:20): pattern interrupt + clear promise
- Setup (0:20-1:00): stakes + credibility
- 3 main beats: each must introduce one new "tool" or "insight"
- 1 counterpoint beat: address the obvious objection
- Close: recap + specific next video tease
Step 2 (wait):
Ask me 5 clarification questions that would improve specificity.
Do NOT write the script until I answer.
This "interview me first" move is popular with creators for Shorts too: ask a few targeted questions, confirm the brief, then write one tight draft [4]. It works because it prevents the model from hallucinating your intent.
One more trick: when you do generate the script, avoid feeding the model earlier failed attempts. Contextual drag research shows that incorrect drafts can bias the next output toward structurally similar mistakes, even when you warn the model [3]. Practically, I'll paste only the approved beat sheet, not my messy first draft.
Prompt pack: Shorts scripts (fast, loopable, platform-aware)
Shorts are basically constraint optimization: speed, clarity, rewatchability. Don't let the model ramble.
Write a YouTube Short script (max {35-45} seconds).
Constraints:
- 1 sentence per line
- No line longer than 10 words
- Must include a loopable ending that re-triggers the hook
- Tone: {tone}
- Format: {talking head / voiceover / text-on-screen}
Content:
- Topic: {topic}
- One surprising fact or claim: {claim}
- One supporting proof: {proof}
- CTA: {comment / subscribe / watch next}
Output:
HOOK:
SCRIPT:
LOOP END:
ON-SCREEN TEXT (3-5 beats):
TITLE (<=60 chars):
DESCRIPTION (2 lines + 3 hashtags):
Prompt pack: descriptions that help discovery and set expectations
Descriptions aren't where discovery magic happens. They're where you reduce disappointment and guide the next action. The best descriptions are clear, scannable, and honest about who the video is for.
You are my YouTube description editor.
Inputs:
- Final title: {title}
- Video summary in 5 bullets: {bullets}
- Audience: {audience}
- CTA: {cta}
- Links: {links}
- Keywords I care about: {keywords}
Write:
1) First 2 lines: hook + clear promise (no clickbait).
2) A short paragraph that sets expectations (what's covered / not covered).
3) A "Next" line that points to a follow-up video concept.
4) 5 hashtag options (only 3 will be used).
Keep it under 200 words.
Prompt pack: thumbnail concepts (not "make it pop," real direction)
A thumbnail is a visual hypothesis. You want a single focal point, a single emotion, and a single readable idea.
Don't ask the model for an "image." Ask for a design spec you can hand to yourself (or a designer).
Act as a YouTube thumbnail creative director.
Inputs:
- Video title: {title}
- Video promise: {promise in 1 sentence}
- Audience: {audience}
- My on-camera style: {face/no face}
- Brand constraints: {colors, fonts, no heavy text, etc.}
Generate 8 thumbnail concepts.
For each concept include:
- Core emotion (e.g., relief, shock, smugness, curiosity)
- Subject (what is the one focal object/person?)
- Background (simple, not busy)
- Text (0-3 words max, optional)
- Composition notes (rule of thirds, arrows, circles, before/after split)
- "Why it works" (tie back to the promise)
Avoid clichés: red arrows everywhere, overused reaction faces, generic "TOP 5".
If you're using AI image tools, you can translate the chosen concept into an image prompt later. But the concept spec is the part most creators skip-and it's the part that prevents random thumbnails that don't match the video.
Practical creator notes (stuff people actually complain about)
When creators ask for help with YouTube scripting tools, the consistent complaint is "the scripts feel generic" and "I rewrite everything anyway" [5]. That's usually not a model problem. It's a prompting and process problem.
The workflow above fixes that by forcing specificity early, and by making you choose before you expand. It also avoids the trap where you keep editing a bad first draft and the model keeps inheriting its shape-exactly the pattern contextual drag warns about [3].
If you want one challenge for your next upload, do this: generate 12 titles and 8 thumbnail concepts first, pick one pairing, then write the script to match that pairing. Your script will get tighter overnight, because it has a clear contract to fulfill.
References
Documentation & Research
- How Descript enables multilingual video dubbing at scale - OpenAI Blog
https://openai.com/index/descript - Asking the Right Questions: Improving Reasoning with Generated Stepping Stones - arXiv
https://arxiv.org/abs/2602.19069 - Contextual Drag: How Errors in the Context Affect LLM Reasoning - arXiv
https://arxiv.org/abs/2602.04288
Community Examples
4. "Stop Asking ChatGPT for 'Good Hooks' - Steal This YouTube Shorts Interview Prompt Instead" - r/PromptEngineering
https://www.reddit.com/r/PromptEngineering/comments/1qy2lcj/stop_asking_chatgpt_for_good_hooks_steal_this/
5. Helpful tools for YouTube script writer? - r/PromptEngineering
https://www.reddit.com/r/PromptEngineering/comments/1qqh25a/helpful_tools_for_youtube_script_writer/
-0174.png&w=3840&q=75)

-0173.png&w=3840&q=75)
-0172.png&w=3840&q=75)
-0171.png&w=3840&q=75)
-0170.png&w=3840&q=75)