Learn how MCP Apps use sandboxed iframes to go beyond text, build safer UI tools, and ship richer agent workflows. Read the full guide.
If you've only thought about MCP as "LLMs calling tools," you're missing the bigger shift. The interesting part is not the API call. It's the interface layer that lets a tool become something you can actually inspect, manipulate, and trust.
MCP Apps are MCP-based tools that expose a user interface, not just a function result. Instead of returning a wall of text, the tool can return HTML that renders inside an embedded frame, giving the model and the user a visual control surface for complex tasks [1][2]. That matters because some jobs are easier to do with buttons, previews, and forms than with raw JSON or prose.
Sandboxed iframes matter because they let you ship interactive HTML without fully trusting that HTML. In browser terms, the sandbox strips away dangerous capabilities like direct access to the parent page, while still allowing scripts and rendering when configured carefully. Community discussion around LessWrong's custom iframe widgets makes the same point: the browser sandbox is what keeps arbitrary embedded UI from becoming arbitrary page control [3].
Text-only tools work fine until the user needs to see, compare, confirm, or edit something visually. At that point, a model output becomes a liability: it can describe the state, but it can't present the state. Research on agent communication protocols says most systems are strong on transport and schema, but weak on semantic alignment and repair [2]. In practice, that means the model can move data around, but it still struggles to communicate intent clearly when the task becomes interactive.
MCP Apps extend tools by turning a tool response into a lightweight app shell. The server can expose a structured result plus embedded HTML, and the host can display that inside a sandboxed frame. That opens the door to workflows like review panels, diff viewers, chart explorers, approval forms, and guided wizards. The point is not prettier output; the point is a better action surface for the model and the human.
The browser sandbox protects the host page from embedded content. In practical terms, the iframe should not be able to read cookies, rewrite the parent UI, or pretend to be the host app just because it can run HTML and JavaScript. LessWrong's discussion is a useful real-world example: the team explicitly relied on sandboxing to keep embedded widgets from escaping into the main page, while still accepting that phishing-like tricks and browser-specific edge cases remain possible [3]. That's the right mental model for MCP Apps too: contained, not magical.
They fit as the presentation layer for agent actions. The protocol handles discovery and invocation; the app frame handles interaction. That separation matches the broader direction of recent MCP research, which treats tools as discoverable capabilities and focuses on making those capabilities more usable under real-world constraints [1][2]. The more complex the task, the more valuable this split becomes. You don't want your model inventing UI in its head when it can render one in the browser.
| Use text when | Use a sandboxed iframe when |
|---|---|
| You need planning, explanation, or summarization | You need inspection, comparison, or approval |
| The output is small and linear | The output is visual or stateful |
| The user can act on a short answer | The user needs to choose among many options |
| The task is mostly inference | The task includes review and control |
That table is the real design rule. Text is great for cognition. UI is great for decisions.
The biggest mistake is treating the iframe like a trust boundary you can ignore. You can't. You still need strict content sanitization, careful messaging between host and frame, and clear limits on what the embedded app is allowed to do. Security research on MCP makes the same broader point: once tools become action-capable, the attack surface expands fast [4]. Rich UI is useful, but only if you keep the blast radius small.
Here's the kind of prompt that produces a mediocre text-only tool:
Review this dataset and tell me if anything looks off.
Now here's a better version for an MCP App workflow:
Inspect the dataset, highlight anomalies, and present the findings in a simple HTML review panel.
Include filters for outliers, missing values, and duplicate rows.
Make the default view show the top 10 highest-risk records.
The second prompt gives the tool room to build an interface, not just an answer. That's where the value is. And if you want a faster way to polish prompts like this, tools like Rephrase can rewrite the rough draft into something much more execution-ready.
The shift here is subtle but important: tools are no longer just callable endpoints. They're becoming tiny applications with their own UI, feedback loop, and interaction model. That brings MCP closer to how people actually work. We don't want every agent task to end in a paragraph. Sometimes we want a preview, a diff, a slider, a confirmation step, or a visual decision tree.
That's also why I think the best MCP Apps will be boring in the best possible way. They'll use HTML only where text breaks down, and they'll keep the rest as plain, auditable, low-friction tool calls. That balance is what makes the interface useful instead of gimmicky.
If you're designing agent workflows, start by asking a simple question: "Does the user need to read this, or do they need to use it?" That answer usually tells you whether you should stay in text or move into a sandboxed iframe. For the prompt side of that equation, Rephrase can help you get from vague intent to a prompt that actually produces the right interaction model.
Documentation & Research
Community Examples
MCP Apps are MCP-powered experiences that expose interactive UI, not just text. They let agents call tools and render HTML interfaces when a task needs clicks, forms, or visual feedback.
No. They complement text tools. Text is still best for planning, explanations, and quick actions, while embedded UIs shine for review, control, and structured input.
Tools that need inspection or interaction benefit most: dashboards, data explorers, form-driven workflows, previewers, and any agent task where the final step is visual.