Blog / Prompt engineering / 40 Prompt Engineering Terms Defined

40 Prompt Engineering Terms Defined

Master prompt engineering vocabulary fast. From temperature to jailbreak, we define 40 real terms with plain-English explanations and usage examples. Read the full guide.

Ilia Ilinskii
Rephrase · March 24, 2026

Prompt engineering9 min read

On this page

Core Prompting Concepts Sampling Parameters Prompting Techniques Context and Memory Safety and Robustness Output Quality Advanced and Production Concepts Putting It Together References

Most AI jargon gets passed around without anyone agreeing on what it actually means. That's a problem when you're building with LLMs professionally.

This glossary covers 40 terms that show up constantly in real prompting work - in code reviews, model docs, research papers, and Slack threads. Definitions are plain English. Where a term has genuine depth, I've linked to further reading.

Key Takeaways

Prompt engineering has a rich and fast-growing vocabulary - The Prompt Report alone identified 33 distinct vocabulary terms worth standardizing [1]
Many terms (like "grounding" and "context window") are used loosely in the wild and mean different things in different communities
Understanding the vocabulary precisely makes you faster at debugging bad outputs and communicating with your team
A few terms (temperature, top-p, system prompt) come directly from model APIs and have exact technical definitions
Others (jailbreak, alignment, hallucination) are more informal but equally important to know

Core Prompting Concepts

Prompt - The input text you send to a language model. Sounds obvious, but in production systems, a "prompt" often refers to the complete assembled input: system instructions, few-shot examples, retrieved context, and the user's actual message - not just what the user typed.

System prompt - A developer-controlled instruction block that runs before the conversation starts. It sets persona, tone, output format, and behavioral constraints. Anthropic's documentation treats it as the primary lever for shaping Claude's behavior at the application layer [2].

User prompt - The message the human sends during an active session. Distinct from the system prompt, though models see both.

Instruction following - A model's ability to do what you explicitly asked. A model with poor instruction following might ignore format requirements, skip steps you listed, or answer a different question than the one you posed.

Context window - The total number of tokens a model can process at once, including system prompt, conversation history, retrieved documents, and the response it's generating. Exceeding it causes the model to "forget" earlier content.

Token - The unit models use to process text. Roughly 0.75 words in English, but tokenization varies by language and model. Token counts drive both cost and context window limits.

Sampling Parameters

Temperature - Controls output randomness. At 0, the model always picks the most probable next token (deterministic). At 1+, it samples more broadly, producing creative or varied outputs. Set low for factual tasks, higher for brainstorming.

Top-p (nucleus sampling) - An alternative randomness control. The model only samples from the smallest set of tokens whose cumulative probability reaches p. A top-p of 0.9 ignores the bottom 10% of the probability distribution. Most practitioners use either temperature or top-p, not both simultaneously.

Top-k - Limits sampling to the k most probable next tokens. Simpler than top-p. Less common in modern API configurations but still appears in some model settings.

Max tokens - A hard cap on response length. The model stops generating once it hits this limit, which can cut off output mid-sentence if set too low.

Stop sequence - A string that tells the model to stop generating when it appears. Useful for structured outputs - e.g., stopping generation when the model writes </answer>.

Prompting Techniques

Zero-shot prompting - Giving the model a task with no examples. Works well for common tasks that models were heavily trained on. Breaks down for unusual formats or domain-specific reasoning.

Few-shot prompting - Providing 2-5 input/output examples inside the prompt before the actual task. One of the most reliable techniques for shaping output format and style. The Prompt Report identifies this as a foundational technique across virtually all model families [1].

# Few-shot example
Classify sentiment:
Text: "The delivery was late." → Negative
Text: "Great packaging!" → Positive
Text: "Works as described." → [model continues]

Chain-of-thought (CoT) - Prompting the model to reason step-by-step before giving a final answer. Either explicit ("think step by step") or demonstrated through few-shot examples that show reasoning. Measurably improves performance on multi-step math and logic tasks.

Zero-shot CoT - The minimal version: appending "Let's think step by step" to a prompt without providing reasoning examples. Surprisingly effective.

Self-consistency - Running the same prompt multiple times, generating several reasoning chains, then selecting the most common final answer. Trades compute for accuracy on reasoning tasks.

Role prompting - Asking the model to adopt a persona ("You are a senior security engineer reviewing this code"). Shifts tone, vocabulary, and sometimes the scope of what the model considers relevant.

Prompt chaining - Breaking a complex task into a sequence of prompts where each output feeds the next. More reliable than trying to cram everything into one massive prompt.

Meta-prompting - Using a model to generate or improve prompts. You describe the task you want a prompt for, and the model drafts it.

Context and Memory

Grounding - Anchoring model outputs to specific source material rather than relying on training knowledge alone. The model is told: "answer only using the following documents." Reduces hallucination on factual queries.

RAG (Retrieval-Augmented Generation) - An architecture where relevant documents are retrieved from a database and injected into the prompt as context before generation. Grounding is the technique; RAG is the system that implements it.

Few-shot context - The examples you inject into the prompt, as distinct from retrieved factual documents. The terminology overlaps with few-shot prompting but emphasizes that examples are a form of context management.

Context stuffing - Filling the context window with as much relevant information as possible. Can help but degrades performance when irrelevant content crowds out the useful parts - sometimes called "lost in the middle" degradation.

Conversation history - Prior turns in a multi-turn session. Models don't have persistent memory by default; conversation history is explicitly passed back in each API call.

Safety and Robustness

Jailbreak - A prompt crafted to bypass safety guidelines. Common patterns include roleplay framing, hypothetical scenarios, or token manipulation. Research into prompt robustness specifically studies models' vulnerability to these inputs [3].

Prompt injection - An attack where malicious instructions are hidden inside content the model is asked to process (e.g., a web page, a document, an email). The model treats the injected text as a legitimate instruction.

Alignment - The degree to which a model's behavior matches intended values and constraints. An "aligned" model follows guidelines reliably; a misaligned one produces outputs its developers didn't intend.

Guardrails - Programmatic or prompt-based constraints that filter or block certain model outputs. Can be built into the system prompt ("never provide medical diagnoses") or enforced by a separate classification layer.

Prompt robustness - How consistently a model produces correct outputs when prompts vary slightly - typos, paraphrasing, reordering. Recent research shows many models are fragile to surface-level prompt changes [3].

Output Quality

Hallucination - A model generating confident, fluent text that is factually wrong. Grounding and RAG are the primary mitigation strategies. Not the same as "making a mistake" - hallucinations often sound authoritative.

Calibration - Whether a model's expressed confidence matches its actual accuracy. A well-calibrated model says "I'm not sure" when it genuinely isn't. Poor calibration correlates with hallucination.

Instruction drift - When a model gradually stops following earlier instructions over a long conversation. A real problem in agentic tasks where system prompt constraints need to hold across many turns.

Output format control - Using structured instructions (JSON schema, markdown headers, numbered lists) to constrain how the model formats its response. Often more reliable than asking in natural language alone.

Advanced and Production Concepts

Prompt asset - A versioned, tracked prompt used in a production system. The term comes from frameworks like Prompt Readiness Levels (PRL), which propose treating prompts with the same rigor as software artifacts - including testing, security evaluation, and deployment sign-off [4].

Prompt versioning - Maintaining a history of prompt changes so you can roll back, compare performance, and understand what changed when output quality shifts.

Prompt injection defense - Techniques to prevent injected instructions from overriding system prompts. Strategies include input sanitization, privilege separation, and instructing the model to distrust user-supplied content.

Latent space - The high-dimensional internal representation space that models use to encode meaning. Not something you directly prompt, but understanding it helps explain why semantically similar prompts produce similar outputs.

Embedding - A numerical vector representation of text. Used in RAG pipelines to find semantically similar documents. The text "what's the weather?" and "give me a forecast" end up close together in embedding space.

Fine-tuning - Updating a model's weights on a specific dataset to shift its default behavior. Different from prompting - it changes the model itself rather than steering it at inference time.

RLHF (Reinforcement Learning from Human Feedback) - A training technique where human preference rankings are used to shape model behavior. The underlying mechanism behind most modern chat models' instruction-following and safety behaviors.

System prompt leakage - When a model reveals the contents of its system prompt in response to user questions. A confidentiality concern in commercial deployments.

Agentic prompting - Prompting a model that can take actions (call APIs, run code, browse the web) rather than just generate text. Requires stricter instruction design because errors compound across steps.

Prompt compression - Techniques to shorten prompts without losing the information the model needs - important for managing token costs in long-context applications.

Putting It Together

Vocabulary matters more than it sounds. When your team uses "grounding" to mean three different things, debugging a production issue becomes a conversation about definitions instead of causes. When you know the difference between temperature and top-p, you stop randomly tuning parameters and start making deliberate tradeoffs.

The field is moving fast - the PRL framework [4] is pushing for production-grade prompt governance that treats these concepts with engineering rigor, not just intuition. That direction is right.

If you find yourself constantly rewriting prompts to hit the right tone or format, tools like Rephrase can automate the iteration loop - it rewrites your draft prompt using the right technique for the task, so you spend time on the hard design decisions rather than manual rewording. For more deep dives on specific techniques, browse the Rephrase blog.

References

Documentation & Research

The Prompt Report - arXiv (https://arxiv.org/abs/2406.06608)
Prompt Engineering Overview - Anthropic Documentation (https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview)
Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO - arXiv (https://arxiv.org/abs/2603.03314)
Prompt Readiness Levels (PRL) - arXiv (https://arxiv.org/abs/2603.15044)

Community Examples

AI Terminology List: Context/Prompt Engineering - GitHub (https://github.com/piotr-liszka/ai-terminology)
Useful Links for Getting Started with Prompt Engineering - r/PromptEngineering (https://www.reddit.com/r/PromptEngineering/comments/120fyp1/useful_links_for_getting_started_with_prompt/)

Frequently asked

What is prompt engineering?

Prompt engineering is the practice of designing and refining text inputs to get reliable, high-quality outputs from large language models. It covers everything from simple instruction phrasing to complex multi-step reasoning techniques.

What does 'temperature' mean in AI prompting?

Temperature is a sampling parameter that controls output randomness. A low value (near 0) makes responses deterministic and focused; a high value (near 1 or 2) makes them more varied and creative.

What is grounding in prompt engineering?

Grounding means anchoring a model's output to specific, verifiable source material - usually documents or retrieved data - rather than letting it rely purely on its training knowledge. It's the core idea behind retrieval-augmented generation (RAG).