Learn how to write database design prompts for schema generation, migrations, and SQL optimization. See practical patterns and examples inside.
Database prompts fail for the same reason product specs fail: they sound clear in your head and vague on the page. With schema design, migrations, and SQL tuning, that vagueness turns into broken joins, risky alters, and expensive queries.
Schema generation works best when you describe user intentions, required entities, constraints, and output format with almost painful specificity. Research on synthetic environment generation shows that prompts become much more reliable when they require exact table names, keys, indexes, SQLite types, and JSON-only output, instead of leaving structure implied [1].
Here's the mistake I see most: people ask, "Design a database for a marketplace app." That's not a prompt. That's a wish.
A stronger approach is to define the jobs the system must support, then constrain the shape of the answer. The AWM paper literally uses task-driven schema generation: start with user intentions, require only necessary tables, include primary keys, foreign keys, indexes, and exclude irrelevant auth fields [1]. That pattern is practical outside research too.
Before:
Design a database for a course platform.
After:
Design a PostgreSQL schema for a course platform with these workflows:
1. Students enroll in courses and track progress by lesson.
2. Instructors publish courses, lessons, and quizzes.
3. Admins issue refunds and suspend accounts.
Requirements:
- Return SQL DDL only
- Include primary keys, foreign keys, unique constraints, and indexes
- Add created_at and updated_at where needed
- Do not invent features outside these workflows
- Keep auth fields out of scope
- Explain any many-to-many tables in comments
What works well here is the constraint density. You're shrinking the model's room to improvise. If you want this step automated across apps, tools like Rephrase can tighten vague drafts into more structured prompts fast.
Migration prompts should ask the model to behave like a cautious database engineer, not a code autocomplete. That means forcing it to reason about dependency order, backfills, locks, rollback paths, and compatibility windows, because schema evolution is where "almost right" becomes downtime [3][1].
This is where prompt engineering often gets too optimistic. A model can write ALTER TABLE statements easily. The hard part is sequencing.
Research on schema evolution makes the point that database changes are not isolated DDL events; they're part of an ongoing evolution process with dependencies and operational tradeoffs [3]. AWM's environment pipeline mirrors that idea by separating schema design, sample data generation, interface generation, and verification [1]. That separation is a useful prompt pattern for migrations too.
Generate a PostgreSQL migration plan for this change:
- Split users.full_name into first_name and last_name
- Keep the application running during rollout
Current constraints:
- users table has 12M rows
- reads are constant, writes are moderate
- old code still reads full_name for one deploy cycle
Output:
1. forward migration SQL
2. backfill SQL in safe batches
3. rollback SQL
4. application rollout order
5. risks and lock-heavy operations to avoid
Assume we want minimal blocking and idempotent scripts where possible.
Here's what I noticed: when you ask for only SQL, the model often skips the operational choreography. When you ask for rollout order and risks, the answer usually improves a lot.
If you do this often, save a reusable migration prompt or rewrite it with Rephrase before sending it to your model. Small wording improvements matter more here than people expect.
A useful SQL optimization prompt includes the query, schema, indexes, execution plan, and optimization goal. Research on Text-to-SQL and query optimization shows that static schema alone is often not enough, because many errors come from ambiguity in real data distributions and from optimizer assumptions that miss semantic patterns [2][4].
This is the biggest trap in SQL prompting. People paste a slow query and ask, "Can you optimize this?" The model responds with boilerplate: add indexes, avoid SELECT *, use joins carefully. None of that is wrong. None of it is enough.
APEX-SQL is especially relevant here. Its core argument is that static schema prompting fails when meaning lives in the data, not just the metadata. Their hypothesis-verification loop improves results by validating assumptions against real distributions [2]. On the optimization side, DBPlanBench shows that LLMs can find better plans when they see structural plan information and reason about semantic selectivity, like reordering joins to cut cardinality early [4].
| Prompt style | What's included | Likely result |
|---|---|---|
| Weak | Query only | Generic tips |
| Better | Query + schema + indexes | Plausible rewrite |
| Best | Query + schema + indexes + EXPLAIN + row counts + goal | Actionable optimization ideas |
Best-version prompt:
Optimize this PostgreSQL query for latency.
Goal: reduce execution time without changing results.
Schema:
- orders(id, customer_id, status, created_at)
- order_items(order_id, product_id, quantity, unit_price)
- products(id, category_id, active)
Indexes:
- orders(status, created_at)
- order_items(order_id)
- products(id, active)
Query:
SELECT ...
[full query here]
EXPLAIN ANALYZE:
[paste plan]
Known facts:
- 92% of orders are status='completed'
- active=false is rare
- Most traffic is for last 30 days
Please:
1. explain the bottleneck
2. suggest a rewrite if useful
3. suggest index changes only if justified
4. note tradeoffs
That last section matters because it gives the model a way to reason semantically, which is exactly where recent optimization work found gains [4].
Database prompts need verification loops because syntactic correctness is not the same as semantic correctness. Recent Text-to-SQL research shows that models often generate valid-looking SQL that fails on the actual database meaning, especially when schemas are ambiguous or queries require data-aware reasoning [2][5].
I think this is the main shift people miss. Better prompting is not just "write nicer instructions." It's "design a loop."
IESR breaks SQL generation into smaller reasoning steps like schema selection, column identification, equation explanation, and SQL revision [5]. APEX-SQL goes further by letting the system test hypotheses against the database itself [2]. Those papers point in the same direction: don't trust first-pass output on hard database tasks.
Use a simple four-step loop in your own prompting process.
That sounds slower, but it's usually faster than debugging a confident wrong answer.
If you want more workflows like this, the Rephrase blog has more prompt breakdowns for structured AI tasks.
Database prompting works best when you treat the model as a junior-but-fast database engineer: great at drafting, bad at mind reading, and unreliable when context is thin. The practical win comes from narrowing scope, forcing structure, and adding validation, not from asking the model to "be smarter" [1][2][4].
My rule is simple. For schema generation, optimize for completeness. For migrations, optimize for safety. For SQL optimization, optimize for evidence.
Once you do that, prompts stop feeling magical and start feeling like good engineering specs. That's the point.
Documentation & Research
Community Examples 6. Postgres vs MySQL vs SQLite: Comparing SQL Performance Across Engines - KDnuggets (link)
Start with user workflows, entities, constraints, and non-goals. Ask for explicit tables, primary keys, foreign keys, indexes, and output in strict JSON or SQL so the result is easy to validate.
A good prompt includes the SQL query, schema, indexes, execution plan, row-count assumptions, and the optimization goal. Without that context, the model tends to give generic advice.