Blog / Prompt engineering / MCP Apps Beyond Text in Sandboxed iframe…

MCP Apps Beyond Text in Sandboxed iframes

Learn how MCP Apps use sandboxed iframes to go beyond text, build safer UI tools, and ship richer agent workflows. Read the full guide.

Ilia Ilinskii
Rephrase · June 1, 2026

Prompt engineering9 min read

On this page

Key Takeaways What are MCP Apps, really?Why do sandboxed iframes matter for MCP Apps?Why text-only tools hit a wall How do MCP Apps extend tools beyond text?What does the browser sandbox actually protect?How do MCP Apps fit into modern agent design?UI vs text for MCP: when should you use each?What should you watch out for when building MCP Apps?Practical prompt example: from plain tool call to MCP App Why this matters for the future of tools References

If you've only thought about MCP as "LLMs calling tools," you're missing the bigger shift. The interesting part is not the API call. It's the interface layer that lets a tool become something you can actually inspect, manipulate, and trust.

Key Takeaways

MCP Apps move beyond plain text by letting tools render interactive HTML inside sandboxed iframes.
The browser sandbox is the real trick: it gives rich UI without handing untrusted content the keys to the parent page.
Research on agent communication keeps showing the same pattern: syntax is easy, shared meaning is hard [1][2].
The best MCP Apps use UI for review and control, and text for planning and explanation.
Tools like Rephrase can help you tighten the prompt that generates these richer workflows.

What are MCP Apps, really?

MCP Apps are MCP-based tools that expose a user interface, not just a function result. Instead of returning a wall of text, the tool can return HTML that renders inside an embedded frame, giving the model and the user a visual control surface for complex tasks [1][2]. That matters because some jobs are easier to do with buttons, previews, and forms than with raw JSON or prose.

Why do sandboxed iframes matter for MCP Apps?

Sandboxed iframes matter because they let you ship interactive HTML without fully trusting that HTML. In browser terms, the sandbox strips away dangerous capabilities like direct access to the parent page, while still allowing scripts and rendering when configured carefully. Community discussion around LessWrong's custom iframe widgets makes the same point: the browser sandbox is what keeps arbitrary embedded UI from becoming arbitrary page control [3].

Why text-only tools hit a wall

Text-only tools work fine until the user needs to see, compare, confirm, or edit something visually. At that point, a model output becomes a liability: it can describe the state, but it can't present the state. Research on agent communication protocols says most systems are strong on transport and schema, but weak on semantic alignment and repair [2]. In practice, that means the model can move data around, but it still struggles to communicate intent clearly when the task becomes interactive.

How do MCP Apps extend tools beyond text?

MCP Apps extend tools by turning a tool response into a lightweight app shell. The server can expose a structured result plus embedded HTML, and the host can display that inside a sandboxed frame. That opens the door to workflows like review panels, diff viewers, chart explorers, approval forms, and guided wizards. The point is not prettier output; the point is a better action surface for the model and the human.

What does the browser sandbox actually protect?

The browser sandbox protects the host page from embedded content. In practical terms, the iframe should not be able to read cookies, rewrite the parent UI, or pretend to be the host app just because it can run HTML and JavaScript. LessWrong's discussion is a useful real-world example: the team explicitly relied on sandboxing to keep embedded widgets from escaping into the main page, while still accepting that phishing-like tricks and browser-specific edge cases remain possible [3]. That's the right mental model for MCP Apps too: contained, not magical.

How do MCP Apps fit into modern agent design?

They fit as the presentation layer for agent actions. The protocol handles discovery and invocation; the app frame handles interaction. That separation matches the broader direction of recent MCP research, which treats tools as discoverable capabilities and focuses on making those capabilities more usable under real-world constraints [1][2]. The more complex the task, the more valuable this split becomes. You don't want your model inventing UI in its head when it can render one in the browser.

UI vs text for MCP: when should you use each?

Use text when	Use a sandboxed iframe when
You need planning, explanation, or summarization	You need inspection, comparison, or approval
The output is small and linear	The output is visual or stateful
The user can act on a short answer	The user needs to choose among many options
The task is mostly inference	The task includes review and control

That table is the real design rule. Text is great for cognition. UI is great for decisions.

What should you watch out for when building MCP Apps?

The biggest mistake is treating the iframe like a trust boundary you can ignore. You can't. You still need strict content sanitization, careful messaging between host and frame, and clear limits on what the embedded app is allowed to do. Security research on MCP makes the same broader point: once tools become action-capable, the attack surface expands fast [4]. Rich UI is useful, but only if you keep the blast radius small.

Practical prompt example: from plain tool call to MCP App

Here's the kind of prompt that produces a mediocre text-only tool:

Review this dataset and tell me if anything looks off.

Now here's a better version for an MCP App workflow:

Inspect the dataset, highlight anomalies, and present the findings in a simple HTML review panel.
Include filters for outliers, missing values, and duplicate rows.
Make the default view show the top 10 highest-risk records.

The second prompt gives the tool room to build an interface, not just an answer. That's where the value is. And if you want a faster way to polish prompts like this, tools like Rephrase can rewrite the rough draft into something much more execution-ready.

Why this matters for the future of tools

The shift here is subtle but important: tools are no longer just callable endpoints. They're becoming tiny applications with their own UI, feedback loop, and interaction model. That brings MCP closer to how people actually work. We don't want every agent task to end in a paragraph. Sometimes we want a preview, a diff, a slider, a confirmation step, or a visual decision tree.

That's also why I think the best MCP Apps will be boring in the best possible way. They'll use HTML only where text breaks down, and they'll keep the rest as plain, auditable, low-friction tool calls. That balance is what makes the interface useful instead of gimmicky.

If you're designing agent workflows, start by asking a simple question: "Does the user need to read this, or do they need to use it?" That answer usually tells you whether you should stay in text or move into a sandboxed iframe. For the prompt side of that equation, Rephrase can help you get from vague intent to a prompt that actually produces the right interaction model.

References

Documentation & Research

The Convergence of Schema-Guided Dialogue Systems and the Model Context Protocol - arXiv (link)
Beyond Message Passing: Toward Semantically Aligned Agent Communication - arXiv (link)
LessWrong Policy on LLM Use - Hacker News (LLM) (link)
A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms - arXiv (link)

Community Examples

LessWrong Policy on LLM Use discussion on sandboxed iframes and custom widgets - Hacker News (LLM) (link)

Frequently asked

What are MCP Apps?

MCP Apps are MCP-powered experiences that expose interactive UI, not just text. They let agents call tools and render HTML interfaces when a task needs clicks, forms, or visual feedback.

Do MCP Apps replace normal text tools?

No. They complement text tools. Text is still best for planning, explanations, and quick actions, while embedded UIs shine for review, control, and structured input.

What kind of tools benefit most from MCP Apps?

Tools that need inspection or interaction benefit most: dashboards, data explorers, form-driven workflows, previewers, and any agent task where the final step is visual.