Most agent failures aren't model failures. They're prompt failures - specifically, the absence of any logic that tells the agent when to stop.
Key Takeaways
- Agent prompts require explicit termination and escalation conditions - "complete the task" is not enough
- Constraint framing defines the boundary of what an agent is allowed to do, not just what it should do
- Confidence thresholds in prompt language force the agent to self-assess before acting
- The ReAct pattern (Reason → Act → Observe) gives agents a structured loop that surfaces errors before they cascade
- Test decision logic against failure scenarios, not just the happy path
Why Agent Prompts Are a Different Engineering Problem
A chatbot prompt is a set of instructions for a single answer. An agent prompt is a set of instructions for a process - one that involves sequential decisions, branching conditions, tool calls, and error handling. The failure modes are completely different. A bad chatbot answer is annoying. A bad agent decision can trigger a cascade of downstream actions that are expensive, irreversible, or both.
Research on multi-agent systems confirms this asymmetry. When agents operate with autonomy in structured deliberation tasks, the reliability of their decision logic depends heavily on how well the prompt constrains their reasoning process and defines acceptable outputs [1]. Vague objectives produce inconsistent behavior. Explicit criteria produce consistent ones.
The practical implication: you can't write an agent prompt the way you write a chatbot prompt. You have to engineer it.
Constraint Framing: Define the Boundary, Not Just the Goal
Constraint framing means specifying what the agent is not allowed to do with the same precision you use to specify what it should do. Most prompts define goals. Few define limits.
A useful mental model is the action space. Before writing a single instruction, ask: what are all the things this agent could do, and which of those are off-limits? Then write those prohibitions explicitly. "Do not modify files outside /output/" is a constraint. "Be careful with files" is not. [4]
Constraints should cover three areas. First, scope limits - what data, systems, or resources the agent can touch. Second, action limits - which operations it can perform (read vs. write, query vs. execute). Third, decision limits - circumstances under which it cannot proceed unilaterally and must escalate.
Here's a before/after example:
# Before (vague)
You are a data processing agent. Process the incoming records and update the database.
# After (constrained)
You are a data processing agent. Your scope is limited to records in the /incoming/ directory.
You may read, validate, and write to /processed/ only.
You must NOT delete records, modify schemas, or access any directory outside /incoming/ and /processed/.
If a record fails validation, log the error to /errors/ and skip it. Do not attempt to correct or infer missing values.
The second version removes ambiguity about what "process" means. The agent can't rationalize its way into a decision you didn't authorize.
Explicit Stop Conditions: Tell the Agent When to Quit
This is the most commonly missing piece in agent prompts. An agent without stop conditions will keep going. Language models are trained to complete - that's the default behavior. You have to override it.
Stop conditions are explicit rules that trigger a halt. They fall into three categories: task completion criteria, error thresholds, and escalation triggers.
Task completion criteria answer the question "how does the agent know it's done?" This should be measurable, not interpretive. "All records in /incoming/ have been processed or logged as errors" is a stop condition. "The task is complete" is not.
Error thresholds define how many retries or failures are acceptable before the agent gives up. Without them, an agent hitting a broken API will retry indefinitely, burning tokens and time.
Escalation triggers are conditions where the agent stops and hands off to a human. Research on human-AI conflict in autonomous settings shows that agents frequently adopt avoidant or deceptive strategies when they encounter ambiguous low-risk situations rather than surfacing the conflict to a human [2]. A well-structured escalation condition short-circuits that behavior.
# Termination & Escalation Conditions
STOP and mark task complete when:
- All files in /incoming/ are moved to /processed/ or /errors/
STOP and escalate to a human when:
- More than 3 consecutive API errors occur
- A record contains a field flagged as PII outside the expected schema
- You cannot determine which of two conflicting rules applies
Do NOT attempt to resolve ambiguity by inferring intent. Surface it.
That last line matters. It explicitly prohibits the hallucination-as-problem-solving pattern.
Confidence Thresholds: Make Uncertainty a First-Class State
A confidence threshold is prompt language that instructs the agent to assess its own certainty before acting. The agent isn't just asked to do a thing - it's asked to evaluate whether it has enough information to do that thing reliably.
This is more practical than it sounds. You don't need numerical scores. You need the agent to distinguish between three states: sufficient confidence to act, insufficient confidence requiring clarification, and insufficient confidence requiring escalation.
Before taking any action, assess your confidence using the following criteria:
HIGH: You have all required inputs, the action is within your defined scope, and the expected outcome is predictable.
→ Proceed.
MEDIUM: You have most inputs but one or more values are missing or ambiguous.
→ State what is missing and request clarification before proceeding.
LOW: Required information is unavailable, the action falls outside your defined scope, or the outcome is unpredictable.
→ Halt. Log your reasoning. Escalate to a human.
Do not proceed at MEDIUM or LOW confidence without explicit instruction.
This kind of structure is especially important in multi-agent systems where one agent's output becomes another agent's input. Research on collective value alignment in LLM-based agents shows that when agents are trained with negotiation-based deliberation - essentially forced to reason through conflict before acting - their conflict-resolution performance improves substantially without degrading general capability [3]. The prompt-level equivalent is requiring the agent to externalize its reasoning state before every consequential action.
Reasoning Protocols: The ReAct Loop as a Decision Structure
The ReAct pattern (Reason → Act → Observe) gives agents a structured internal loop that makes each decision step auditable. Instead of jumping from input to action, the agent must articulate what it knows, what it's about to do, what it expects to happen, and what it will do if that expectation fails [5].
Before every action, output the following:
REASON: What do I know? What is the current state?
NEXT ACTION: What am I about to do and why?
EXPECTED RESULT: What should happen if this succeeds?
FALLBACK: What will I do if this fails?
Then execute the action. Then observe the result against your expected result.
If the observed result diverges significantly, reassess before continuing.
This structure does two things. It reduces hallucination by forcing the agent to check its assumptions before committing to an action. And it makes failures interpretable - you can trace exactly where the reasoning broke down.
Testing Decision Logic Before It Runs Unsupervised
You cannot validate an agent prompt by testing only the scenario where everything works. The decision logic that matters is what happens when something breaks.
Specifically: inject missing inputs and see if the agent escalates or improvises. Simulate tool failures and count retries. Present two conflicting rules and see if the agent surfaces the conflict or silently picks one. Run the agent for enough steps that context drift becomes a factor - does it stay consistent with its initial constraints?
A useful principle from the practitioner community: correct output with incoherent reasoning is a fragile success [4]. If the agent got the right answer but its reasoning trace is incoherent, it will fail under a slightly different input. You need both the output and the reasoning to be sound before you deploy.
Document your failure test cases as formal acceptance criteria. If the agent cannot pass the failure tests, it's not ready for production - regardless of how well it handles the happy path.
Bringing It Together
The pattern across all of this is the same: agent prompts need to make the implicit explicit. Goals need measurable completion criteria. Constraints need to be stated as prohibitions, not suggestions. Uncertainty needs to be a named state with a defined response. And every decision point needs a fallback.
If you're building agents at scale and want to speed up the iteration cycle on prompt structure, tools like Rephrase can help you rewrite and tighten agent prompt drafts before you wire them into a loop - especially useful when you're iterating fast across different agent roles.
The agents that fail in production aren't usually the ones given hard tasks. They're the ones given vague instructions and no exit ramp. Give your agent a way out, and it'll take it gracefully. Don't, and it'll invent one.
For more on prompt engineering techniques for complex workflows, visit the Rephrase blog.
References
Documentation & Research
- An Interactive Multi-Agent System for Evaluation of New Product Concepts - arXiv (arxiv.org/abs/2603.05980)
- ConflictBench: Evaluating Human-AI Conflict via Interactive and Visually Grounded Environments - arXiv (arxiv.org/abs/2603.08024)
- Learning to Negotiate: Multi-Agent Deliberation for Collective Value Alignment in LLMs - arXiv (arxiv.org/abs/2603.10476)
Community Examples
- Stop writing Agent prompts like Chatbot prompts - r/PromptEngineering (reddit.com)
- Most people treat AI like a search engine. I started using "ReAct" loops - r/PromptEngineering (reddit.com)
-0265.png&w=3840&q=75)

-0262.png&w=3840&q=75)
-0264.png&w=3840&q=75)
-0266.png&w=3840&q=75)
-0261.png&w=3840&q=75)