Your LLM just wrote a JOIN on a column that doesn't exist. Again. Here's how to make sure it never does.
## Key Takeaways
- Always inject your full `CREATE TABLE` DDL into the prompt - schema context is the single biggest factor in SQL generation accuracy
- Specify the SQL dialect explicitly (PostgreSQL, BigQuery, SQLite) every time
- Ask the model to explain its query before you run it - this catches logic errors before they hit your data
- For multi-turn workflows, restate schema context at each new question to prevent **schema drift**
- Validate column and table names programmatically against your information schema before execution
## Why LLMs Fail at SQL (And When They Don't)
LLMs are surprisingly capable at generating syntactically valid SQL. The problem is semantic validity - writing a query that runs correctly against *your* specific database. Without knowing your schema, the model interpolates from whatever it saw in training. That means generic table names, invented column names, and plausible-sounding joins that reference nothing real.
Research on text-to-SQL systems confirms that schema linking - correctly mapping natural language terms to actual table and column names - is the hardest part of the problem [1]. When you skip schema injection, you're asking the model to solve that problem blind.
Multi-turn conversations make this worse. Recent work on multi-turn text-to-SQL shows that models lose track of which columns and tables are relevant as the conversation progresses, leading to compounding errors across follow-up questions [1]. If you're building a conversational data assistant, you need to explicitly re-anchor the schema at each turn.
## How to Inject Schema Context Effectively
The rule is simple: paste your `CREATE TABLE` statements into the prompt, every time. Not a description. Not a bullet list of column names. The actual DDL.
Here's why this works. The DDL encodes data types, constraints, foreign key relationships, and naming conventions - everything the model needs to reason about joins, aggregations, and filter conditions. Research on table question answering shows that context engineering that reduces ambiguity is one of the most reliable ways to cut hallucination rates [3].
For large schemas with dozens of tables, don't dump everything. Include only the tables relevant to the question, and add a short inline comment on any column whose business meaning isn't obvious from the name.
```sql
-- Schema context for the LLM
-- orders table: one row per customer purchase
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL, -- FK to customers.id
created_at TIMESTAMPTZ NOT NULL,
total_cents INTEGER NOT NULL, -- stored in cents, not dollars
status TEXT CHECK (status IN ('pending', 'shipped', 'cancelled'))
);
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
country_code CHAR(2) NOT NULL -- ISO 3166-1 alpha-2
);
That comment about total_cents matters more than it looks. Without it, the model might write SUM(total_cents) and label it as a dollar figure in the output - technically correct SQL, semantically wrong analysis.
The Prompt Template That Works
Here's the structure I use for every SQL generation task. It's opinionated, but it works consistently across GPT-4, Claude, and Gemini.
You are a [DIALECT] SQL expert. Use only the tables and columns defined below.
Do not invent column names. If you cannot answer the question with the given schema, say so.
## Schema
[PASTE CREATE TABLE STATEMENTS HERE]
## Business Rules
- [Any constraints or conventions the model should know - e.g., "deleted rows have status='archived'"]
- [Currency is always stored in cents]
## Question
[Your natural language question]
## Output Format
1. The SQL query, formatted and readable
2. A plain-English explanation of what the query does, step by step
3. Any assumptions you made
The "Output Format" section is doing heavy lifting here. Asking for a step-by-step explanation forces the model to reason through its own query, which catches a surprising number of self-generated errors before you ever see the SQL.
Dialect-Specific Templates
Each SQL engine has quirks that trip up generic prompts. Here are the key differences to communicate explicitly.
| Feature | PostgreSQL | BigQuery | SQLite |
|---|---|---|---|
| String quotes | Single quotes ' |
Single quotes ' |
Single quotes ' |
| Identifier quotes | Double quotes " |
Backticks ` |
Double quotes or none |
| Date truncation | DATE_TRUNC('month', ts) |
DATE_TRUNC(ts, MONTH) |
strftime('%Y-%m', ts) |
| Array support | ARRAY[], unnest() |
ARRAY<T>, UNNEST() |
Not native |
| Window functions | Full support | Full support | Limited |
| Regex | ~ operator |
REGEXP_CONTAINS() |
GLOB only |
Tell the model which dialect you're targeting at the top of every prompt. "Write a PostgreSQL query" and "Write a BigQuery query" will produce meaningfully different outputs for anything involving dates, arrays, or JSON.
PostgreSQL Template
You are a PostgreSQL 16 expert. Use only the schema below. Use double-quote identifiers only when necessary.
## Schema
[DDL here]
## Question
[Question here]
BigQuery Template
You are a Google BigQuery SQL expert (standard SQL dialect).
Use backtick identifiers for all table and column references in the format `project.dataset.table`.
## Schema
[DDL or BQ table schema JSON here]
## Question
[Question here]
SQLite Template
You are a SQLite 3.45 expert. Do not use window functions or features not available in SQLite.
Avoid FULL OUTER JOIN (not supported). Use strftime() for all date operations.
## Schema
[DDL here]
## Question
[Question here]
Handling Data Transformations
Data transformation prompts need one additional ingredient: sample data. Even two or three example rows help the model understand the actual shape of your data - especially for messy CSVs, mixed-type columns, or inconsistently formatted dates.
Recent research on deep tabular reasoning highlights that LLMs struggle specifically with non-canonical layouts and ambiguous headers [2]. The fix is the same: give explicit context before asking for the transformation.
Transform the following CSV data into a normalized SQL INSERT statement for the orders table.
## Target Schema
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
created_at DATE NOT NULL, -- format: YYYY-MM-DD
total_cents INTEGER NOT NULL
);
## Sample Input Data (first 3 rows)
order_id,date,amount
1001,2026-01-15,$45.99
1002,01/16/2026,12.00
1003,Jan 17 2026,8.50
## Task
Write a SQL INSERT statement that handles the inconsistent date formats and converts
the amount column from dollar strings to integer cents.
That prompt will get you a working transformation with explicit CASE logic for the date formats. Without the sample rows, you'd get a generic template that ignores the actual problem.
Validating Output Before It Hits Production
Never run LLM-generated SQL directly against production. This isn't about distrust - it's about the gap between "syntactically valid" and "semantically correct."
Three-step validation that takes under a minute:
First, check the explanation the model provided. Does the logic actually match your question? A model that generates correct SQL for the wrong interpretation is still a problem.
Second, verify column and table names programmatically. In PostgreSQL, query information_schema.columns. In BigQuery, use INFORMATION_SCHEMA.COLUMNS. In SQLite, use PRAGMA table_info(tablename). A quick script that diffs the SQL's referenced identifiers against your actual schema will catch hallucinated columns before execution.
Third, run with LIMIT 10 on a read-only replica or development database first. Inspect the result. Does the row count and data look reasonable? If you're aggregating revenue and getting a negative number, something is wrong upstream.
Tools like Rephrase can help you tighten the prompt itself before you submit it - catching vague phrasing that tends to produce ambiguous SQL before the model ever sees it.
Multi-Turn Conversations and Schema Drift
If you're building a data assistant where users ask follow-up questions, schema drift is your biggest enemy. The Track-SQL research [1] found that generative models lose track of relevant schema elements across conversation turns, which causes them to reference stale or incorrect columns in follow-up queries.
The practical fix: treat each turn as a fresh prompt with the full schema injected again. Yes, this uses more tokens. No, there's no reliable shortcut. For very large schemas, at minimum re-inject the tables that were referenced in the previous query.
[Previous query and result summary]
The last query used: orders, customers
## Current Schema (relevant tables)
[Re-paste orders and customers DDL]
## Follow-up Question
[Next question]
This pattern, combined with explicit schema injection and dialect specification, covers the majority of real-world SQL generation failures. The model isn't bad at SQL - it just needs the same context a junior analyst would need on their first day.
More prompt engineering techniques for data workflows are on the Rephrase blog.
References
Documentation & Research
- Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL - arXiv (link)
- Deep Tabular Research via Continual Experience-Driven Execution - arXiv (link)
- DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering - arXiv (link)
Community Examples
- Postgres vs MySQL vs SQLite: Comparing SQL Performance Across Engines - KDnuggets (link)
-0257.png&w=3840&q=75)

-0254.png&w=3840&q=75)
-0247.png&w=3840&q=75)
-0246.png&w=3840&q=75)
-0239.png&w=3840&q=75)