How to Write Prompts for AI Code Generation (So You Get Mergable Code, Not Demos)
A practical, developer-first way to prompt code models: turn vague intent into specs, constrain output, and iterate with tests.
-0113.png&w=3840&q=75)
Most "bad" AI-generated code isn't bad because the model can't code. It's bad because you asked for code the way you'd ask a coworker in a hallway: a loose goal, fuzzy constraints, and zero definition of done.
For code generation, prompting is basically spec writing. If you treat it like a spec, you'll get code you can actually run, test, and review. If you treat it like a wish, you'll get a charming snippet that fails in production.
Here's the mental shift that changed my results: don't prompt for code. Prompt for a contract-inputs, outputs, invariants, constraints, and a way to verify. The code is just the byproduct.
Why structure matters more for code than for text
Code models are unusually sensitive to format, examples, and output constraints, because a lot of their useful training signal looks like "documentation → implementation" pairs. A recent paper, BatCoder, leans into exactly this idea: it trains models by generating structured documentation and then reconstructing the original code from it, using similarity as a reward signal [1]. Even if you never touch reinforcement learning, the implication for prompting is straightforward: the more your prompt resembles crisp, well-formed documentation, the more you're steering the model into a high-signal groove.
The second implication is less obvious: long prompts aren't automatically better. Giabbanelli's 2026 guide points out that piling on techniques and length can degrade performance ("over-prompting"), and that selectivity often beats completeness when models can't effectively use everything you stuffed into context [2]. In code prompts, that shows up as the model "forgetting" an early constraint, or mixing incompatible requirements.
So the game is: provide the right structure and the minimum sufficient detail, and then force verification.
My prompt pattern for code: Spec → Plan → Generate → Verify → Patch
I write prompts in five beats. Not as a fancy framework-just because it matches how we already build software.
First, I nail the spec. This is where you stop saying "build X" and start saying what "done" means: signatures, edge cases, performance constraints, libraries allowed, and what you want the output to look like. BatCoder's prompt templates are a good north star here because they explicitly require imports, function definition, and example I/O in a docstring-like structure [1]. That's basically the prompt equivalent of "write the header file first."
Second, I ask for a plan that's short and checkable. This is not the same as asking the model to "think step by step" forever. I want a compact sequence of implementation steps, plus a list of tricky cases it intends to handle. (I also often ask it to name the top 3 failure modes up front.)
Third, I ask it to generate code with strict output formatting. One file. One module. Or "only a unified diff." Whatever is appropriate. This is where constraints save you. If you don't constrain the response, the model will happily include commentary, alternative solutions, and half a README.
Fourth, I force verification. If the task is algorithmic, I ask for tests. If it touches APIs, I ask for mockable interfaces. If it's a refactor, I ask for behavioral equivalence checks. Giabbanelli recommends decomposing tasks and following each task with a validation prompt-basically building a mini feedback loop into your interaction [2]. For codegen, this is where you turn "looks right" into "we can run it."
Finally, I patch with evidence. If tests fail or constraints are violated, my next prompt is not "try again." It's "here's the failing test and stack trace; produce the smallest change that fixes it without breaking these invariants."
That's it. Spec beats vibes.
A concrete prompt template I actually use
This is the template I paste for "write a new function" work. It's intentionally doc-style and example-heavy, borrowing the same instincts as BatCoder's documentation prompts [1].
You are a senior engineer. Write production-quality code.
Goal
Build a {language} function with this signature:
{signature}
Behavior
- Inputs: {input constraints}
- Output: {output contract}
- Errors: {what to raise/return}
- Edge cases: {list}
Constraints
- Allowed deps: {deps}
- Forbidden deps: {deps}
- Complexity target: {time/space}
- Style: {lint/format conventions}
- Do not change: {any fixed interfaces}
Examples (must pass)
1) {example input} -> {example output}
2) {example input} -> {example output}
3) {example input} -> {example output}
Deliverables
1) Code only (no commentary).
2) Include unit tests using {framework}.
3) Include a brief note at top of file listing assumptions (max 5 lines).
If anything is ambiguous, ask up to 3 questions before coding.
Two details matter more than people think.
One is the "Examples (must pass)" block. It's not just for clarity; it's a hard anchor. It makes it easier for the model to self-check during generation, and easier for you to reject output without arguing about taste.
The other is "ask up to 3 questions before coding." This is my cheap way to prevent the model from guessing the spec. If it asks nothing, it's usually because you gave it a complete contract.
Practical tweaks that improve codegen fast
When I want runnable code, I often format the prompt as if it's a code comment or docstring inside a code block. People in the prompt engineering community report better results doing this "code-comment prompting" style, especially for Python snippets [5]. I don't buy the mystical explanation that it "triggers different weights," but the practical effect is real: it nudges the model into documentation-compliance mode, which is exactly what BatCoder shows is useful training signal [1].
I also keep an eye on prompt bloat. Giabbanelli notes that adding more prompt techniques and more length can degrade outputs, and that LLMs can struggle even when the needed info is technically present in long context [2]. In codegen terms, the model might comply with your last constraint and quietly violate the first.
My fix is ruthless: if a constraint is important, it must be short, explicit, and testable. If it can't be tested, I rewrite it until it can.
The uncomfortable truth: your "system prompt" is not a safe place for secrets
If you're building internal tooling-Cursor rules, Claude Code agents, repo-level "AI instructions"-don't assume your hidden instructions are hidden. A 2026 paper, Just Ask, shows that agentic interactions can extract system prompts at high success rates across many commercial models via multi-turn strategies [3]. That's more security than prompting, but it changes how I write "house style" coding prompts: I treat them as public. No API keys. No proprietary logic described in plaintext. No "secret sauce" instructions.
Instead, I put durable constraints in code (linters, tests, CI) and use prompts to steer behavior, not to enforce policy.
Closing thought: prompt like you're writing a pull request description
If you want better AI code generation, stop aiming for "the perfect prompt." Aim for a prompt that reads like a crisp PR: what changed, why, constraints, risks, and how to validate.
Try this the next time you ask for code: write the unit tests first (even rough ones), paste them into the prompt, and tell the model "make these pass without changing the tests." You'll be shocked how quickly "neat demo code" turns into "reviewable code."
References
References
Documentation & Research
- BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation - arXiv cs.LG (2026) https://arxiv.org/abs/2602.02554
- A Guide to Large Language Models in Modeling and Simulation: From Core Techniques to Critical Challenges - arXiv cs.AI (2026) https://arxiv.org/abs/2602.05883
- Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs - arXiv cs.AI (2026) https://arxiv.org/abs/2601.21233
Community Examples
4. What's your process for writing good AI prompts? - r/PromptEngineering (2026) https://www.reddit.com/r/PromptEngineering/comments/1r743fm/whats_your_process_for_writing_good_ai_prompts/
5. The "Code-Comment" Prompting Technique: The best way to get runnable Python snippets. - r/PromptEngineering (2026) https://www.reddit.com/r/PromptEngineering/comments/1qxe84i/the_codecomment_prompting_technique_the_best_way/
Related Articles
-0124.png&w=3840&q=75)
Perplexity AI: How to Write Search Prompts That Actually Pull the Right Sources
A practical way to prompt Perplexity like a research assistant: tighter questions, better constraints, and built-in verification loops.
-0123.png&w=3840&q=75)
How to Write Prompts for Grok (xAI): A Practical Playbook for Getting Crisp, Grounded Answers
A developer-friendly guide to prompting Grok: structure, constraints, iterative refinement, and how to test prompts like a product.
-0122.png&w=3840&q=75)
Best Prompts for Llama Models: Reliable Templates for Llama 3.x Instruct (and Local Runtimes)
Prompt patterns that consistently work on Llama Instruct models: formatting, role priming, structured outputs, and safety-aware prompting.
-0121.png&w=3840&q=75)
GPT-5.2 Prompts vs Claude 4.6 Prompts: What Actually Changes (and What Doesn't)
A practical, prompt-engineering comparison between GPT-5.2 and Claude 4.6: where wording matters, where it doesn't, and how to write prompts that transfer.
