Blog / Prompt engineering / Codex CLI Approval Modes and Risk

Codex CLI Approval Modes and Risk

Learn how Codex CLI approval modes map to real risk tolerance, from cautious prompts to full automation. Read the full guide.

Ilia Ilinskii
Rephrase · May 31, 2026

Prompt engineering7 min read

On this page

Key Takeaways What do Codex CLI approval modes actually mean?How does Suggest map to low-trust tasks?When is Auto the right middle ground?Why is Full-Auto the highest-risk mode?How should risk tolerance decide the mode?What does the research say about agent risk in practice?How should teams set defaults for Codex CLI?Practical before-and-after prompt examples Final take References

If you treat Codex CLI approval modes like a convenience setting, you'll eventually get burned. The better mental model is risk tolerance: the mode should match how much damage a bad action could do, how easy it is to undo, and how sure you are about the task.

Key Takeaways

Suggest is for high-uncertainty work where you want the model to draft actions, not execute them.
Auto fits bounded tasks with low blast radius and good rollback.
Full-Auto only makes sense when the action is routine, reversible, and tightly scoped.
Runtime safety research keeps pointing at the same thing: side effects, multi-step chains, and obfuscation are where risk rises fast [3].
The right approval mode is less about trust in the model and more about trust in the task.

What do Codex CLI approval modes actually mean?

Codex CLI approval modes are a control layer for agentic execution. In practice, they define when the model should stop and ask, when it can proceed with light autonomy, and when it can run end-to-end without human confirmation. That's not just UX. It's a safety policy, and the best way to think about it is through reversibility and impact [1][3].

How does Suggest map to low-trust tasks?

Suggest is the mode I'd use when the cost of a wrong move is high or the task itself is fuzzy. The model can propose commands, edits, or actions, but you review before anything happens. That lines up with a conservative posture in the safety literature: keep the decision boundary human-readable and slow down when scope or consequences are unclear [1].

A good rule: if you're touching credentials, production data, deletion, deployments, or external APIs, Suggest is usually the right default.

When is Auto the right middle ground?

Auto is the sweet spot for repetitive work that still has some risk but doesn't deserve full human gating every step. Think local refactors, test runs, formatting, small file edits, or scoped repo operations. The research angle matters here: agent safety frameworks increasingly separate allow, warn, block, and review because not every action deserves the same friction [3].

What matters most is rollback. If you can recover quickly, Auto becomes much more reasonable.

Why is Full-Auto the highest-risk mode?

Full-Auto is for tasks where the agent can finish without asking. That's powerful, but it also removes your last line of defense. Runtime safety papers show that once tools can chain together actions, the risk isn't just a single bad command; it's the accumulation of side effects across steps [3]. One misread instruction can become a cascade.

I'd only use Full-Auto when the task is boring, deterministic, and easy to undo. Anything involving writes outside your local sandbox deserves skepticism.

How should risk tolerance decide the mode?

The easiest way to choose is to rank the task by blast radius, reversibility, and confidence. If all three are low, automation is fine. If any one of them jumps, tighten the mode. That's exactly the kind of structured thinking safety researchers push toward: classify the action, then decide the response based on policy and context [1].

Task type	Blast radius	Reversible?	Suggested mode
Drafting a commit message	Low	Yes	Full-Auto
Formatting files in a repo	Low	Yes	Auto
Refactoring core logic	Medium	Usually	Auto or Suggest
Deploying to production	High	Sometimes	Suggest
Rotating secrets	High	No	Suggest
Deleting cloud resources	Very high	No	Suggest

The table is blunt on purpose. If the outcome is hard to undo, don't let speed bully you into autonomy.

What does the research say about agent risk in practice?

The pattern across recent work is consistent. Safety is not static; it changes under pressure. When stress and temptation rise, risk rises too [2]. In other words, a mode that looks fine in a calm local workflow can become a bad idea the moment the agent gets access to more powerful tools or more ambiguous instructions.

That's why approval modes should be treated like operating envelopes, not personality settings.

How should teams set defaults for Codex CLI?

Teams should set the default mode based on the worst plausible failure, not the average one. A solo developer might tolerate more automation in a local branch. A team touching customer data should be more conservative by default. That matches the broader direction of controllable safety systems: centralized boundaries for critical risks, flexible policy for everything else [1].

Here's the practical version I use: start in Suggest, earn Auto with repeated success, and reserve Full-Auto for low-stakes, reversible chores. If the task touches external systems, I slow down immediately.

Practical before-and-after prompt examples

This is where mode choice and prompt quality meet. If your prompt is vague, even a cautious mode can drift. Clean up the instruction first, then choose the approval level. Tools like Rephrase can compress that cleanup step so you're not manually rewriting every request.

Weak prompt	Better prompt	Safer mode
"Fix my deployment issue"	"Inspect the failing staging deploy, identify the root cause, and propose a minimal fix. Don't apply changes without approval."	Suggest
"Clean up this repo"	"Format the repo, remove dead code comments, and leave all functional behavior unchanged."	Auto
"Update production config"	"Prepare a production config change for review, but do not execute or deploy it."	Suggest
"Run everything needed to finish this task"	"Execute local tests and safe file edits only. Ask before any network, delete, or deploy action."	Auto

The useful trick is to make the scope explicit. The more precise the prompt, the easier it is to trust a less restrictive mode.

Final take

I don't think of approval modes as "more convenient" or "more secure." I think of them as different bets on failure. Suggest is for uncertainty, Auto is for bounded routine work, and Full-Auto is for low-risk repetition you can afford to lose. If you want the agent to move faster without getting reckless, combine a sharper prompt with the right mode. If you want more practical prompting workflows like this, the Rephrase blog has more articles on prompt engineering, and Rephrase can help tighten the prompt before the model ever starts acting.

References

Documentation & Research

Beyond Static Alignment: Hierarchical Policy Control for LLM Safety via Risk-Aware Chain-of-Thought - arXiv (link)
AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation - arXiv (link)
AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use - arXiv (link)

Frequently asked

What are Codex CLI approval modes?

Codex CLI approval modes are guardrails that decide how much the agent can do without asking. They usually range from suggest-first to increasingly autonomous execution.

Is Auto mode safe for production?

Sometimes, but only for tightly bounded tasks with clear rollback paths. Research on runtime agent safety shows that side effects, obfuscation, and multi-step chains are where risk climbs fast [3].

Can prompt tools help choose safer modes?

Yes. Tools like [Rephrase](https://rephrase-it.com) can rewrite your intent into a cleaner, more explicit prompt before you hand it to an agent.