Blog / Prompt engineering / How MCP and Tool Search Change Agents

How MCP and Tool Search Change Agents

Learn how MCP and GPT-5.4 tool search reshape AI agent architecture, from schema design to discovery, orchestration, and safety. Read the full guide.

Ilia Ilinskii
Rephrase · March 15, 2026

Prompt engineering7 min read

On this page

Key Takeaways What are MCP and tool search changing in agent architecture?Why does MCP matter more than another API wrapper?Why is tool search now the hard problem?How should you design MCP schemas for better tool search?What does the new agent architecture look like in practice?Discovery layer Execution layer State layer Safety layer How should teams adapt their prompting and workflows?References

AI agents used to be mostly prompt wrappers with a few hardcoded functions. That era is ending fast.

What's changing now is not just model quality. It's the combination of MCP as the interoperability layer and tool search as a first-class model capability. Put those together, and agent architecture starts looking less like a script and more like a runtime.

Key Takeaways

MCP turns tool access into a standard interface, which reduces custom integrations and makes agents more modular.
GPT-5.4-style tool search changes the bottleneck from "can the model call a tool?" to "can it find the right one at scale?" [1]
Schema quality now matters as much as prompt quality because descriptions and input schemas directly shape tool selection [3].
Modern agent stacks need discovery, routing, memory, and guardrails, not just a reasoning loop.
The best practical pattern is progressive disclosure: show fewer tools first, then reveal detail only when needed [3].

What are MCP and tool search changing in agent architecture?

MCP and tool search are shifting agents from fixed workflows toward dynamic systems that discover, evaluate, and call tools at runtime. The architecture is becoming more like a protocol-driven platform: tools are externalized, schemas become part of reasoning, and orchestration must handle search, filtering, safety, and long-horizon state [2][3].

The cleanest way to think about it is this: old agents were built around a model plus a handpicked tool belt. New agents are built around a model plus a searchable tool universe.

MCP gives that universe structure. In the protocol, a host talks to MCP servers through standardized primitives like tools, resources, and prompts [3]. That sounds abstract, but the architectural consequence is huge: tool access is no longer deeply embedded into the app. It becomes a layer.

OpenAI's GPT-5.4 announcement matters here because it explicitly frames tool search as a frontier capability alongside coding and computer use [1]. Even though the public article is brief, the implication is obvious. If the model is better at searching for tools, then your architecture no longer has to assume a tiny static tool list.

Why does MCP matter more than another API wrapper?

MCP matters because it solves the N-to-M integration mess that made agent systems brittle and expensive to maintain. Instead of writing custom glue for each host-tool pair, teams can expose capabilities through a standard protocol with shared semantics around tools, resources, prompts, sessions, and transports [2][3].

This is where the "USB-C for AI" metaphor actually earns its hype. The protocol creates a common surface between models and external systems. In the research literature, MCP is framed as the operational version of schema-guided interaction: the model discovers what exists from machine-readable descriptions instead of relying on hardcoded product knowledge [3].

Google's work on gRPC transport for MCP adds another important detail: the protocol layer itself is maturing for production use, not just demos [2]. That means teams are already thinking beyond simple local tool calls and into transport choice, backpressure, observability, auth, and enterprise policy. In other words, MCP is forcing agent architecture to grow up.

Here's the practical shift:

Old agent stack	New MCP-based stack
Hardcoded functions	External MCP servers
Static tool list	Searchable tool inventory
Prompt decides everything	Prompt + schema + router decide
One-off integration logic	Standardized protocol layer
Minimal infra concerns	Transport, auth, observability, safety

Why is tool search now the hard problem?

Tool search is now the hard problem because the number of available tools is exploding faster than a model's ability to reason over giant, noisy catalogs. Once agents can access hundreds or thousands of tools, success depends less on raw intelligence and more on search, ranking, and schema interpretation [3][4].

This is the catch most teams miss. Calling a tool is the easy part. Finding the right tool in a messy ecosystem is the real architecture problem.

The HumanMCP paper makes this painfully clear. It studies retrieval across roughly 2,800 tools and shows that performance degrades as more tools are placed in context, while semantic retrieval pipelines outperform naive long-context stuffing [4]. That lines up with broader MCP benchmark findings summarized in recent research: even strong frontier models struggle with unfamiliar tools, ambiguous schemas, and distractors [3].

So the architecture pattern changes from:

Give the model all tools.
Hope it picks well.

to something more like:

Retrieve likely tool families.
Rank candidates semantically.
Reveal detailed schemas only for shortlisted tools.
Execute with validation and guardrails.

That is why "tool search" is not a feature checkbox. It's a systems design problem.

How should you design MCP schemas for better tool search?

You should design MCP schemas for semantic clarity, explicit action boundaries, and progressive disclosure because models choose tools from descriptions, not just signatures. Better metadata improves discovery, lowers confusion, and makes orchestration more reliable under scale and long contexts [3][4].

Here's what I noticed reading the MCP research: prompt engineering is moving one layer down into schema engineering.

A weak tool schema looks like this:

create_ticket(name, body, priority)

A stronger one looks like this:

Creates a support ticket for a customer-reported issue.
Use when the user wants to report, track, or escalate a product problem.
Required fields: short summary, detailed issue description.
Do not use for feature requests or billing refunds.
Returns ticket ID and status.

Before → after matters here just like it does in prompts.

Before	After
"search"	"Search product docs for troubleshooting steps and API errors"
"create_ticket"	"Create a customer support ticket for product bugs or account issues"
No failure guidance	"Returns NOT_FOUND, RATE_LIMITED, or AUTH_REQUIRED with next-step hints"
Flat catalog	Category summary first, full schema later

The strongest ideas in the convergence paper are especially relevant: semantic completeness, explicit action boundaries, failure-mode documentation, progressive disclosure, and inter-tool relationship declaration [3]. That's not academic fluff. That's your production checklist.

If you write prompts often, this is exactly the kind of repetitive improvement work that Rephrase can speed up when you're drafting tool descriptions, internal prompts, or system instructions across apps. And if you want more articles on this kind of workflow design, the Rephrase blog is worth browsing.

What does the new agent architecture look like in practice?

The new agent architecture is layered: model, retriever, protocol client, tool servers, state manager, and safety controls all work together. A single reasoning loop is no longer enough because discovery, execution, error recovery, and long-horizon coordination must be handled as separate concerns [2][3][5].

One of the most useful recent findings comes from the DDL2PropBank benchmark: frameworks with native MCP support and unified tool registration reduce implementation complexity significantly [5]. That suggests architecture decisions are increasingly about developer ergonomics too, not just runtime behavior.

A practical stack now looks like this:

Discovery layer

A retrieval or routing component narrows the tool universe before the model sees full schemas. This reduces token bloat and improves relevance [3][4].

Execution layer

The model calls tools through MCP clients talking to MCP servers. This is where transport, session handling, and structured outputs matter [2][3].

State layer

Longer tasks need persistent context, checkpoints, and execution history. Stateless prompting breaks fast in multi-step work [3].

Safety layer

Approval flows, method-level authorization, tool allowlists, and metadata scrutiny protect against bad calls and tool poisoning [2][3].

This is also why I don't think "just use a better model" is a serious architecture strategy anymore. Better models help. Better protocols and better search help more.

How should teams adapt their prompting and workflows?

Teams should stop treating tool use as a hidden implementation detail and start designing prompts, schemas, and retrieval as one system. The best results come when prompts define intent clearly, schemas encode action semantics, and discovery layers keep the model focused on a small, relevant candidate set [1][3][4].

If I were updating an agent stack today, I'd do three things first. I'd shorten the visible tool list. I'd rewrite every tool description for semantic clarity. And I'd add a retrieval layer before exposing full schemas.

That's also where tools like Rephrase fit naturally. When you're constantly rewriting system prompts, tool descriptions, and user-facing instructions, it helps to automate the "make this clearer and more structured" step instead of doing it manually every time.

The bigger pattern is simple: prompt engineering is becoming architecture engineering.

MCP gives agents a standard way to touch the outside world. Tool search decides whether they can do that intelligently. Together, they're pushing agent design away from handcrafted flows and toward searchable, protocol-driven systems.

That's a better direction. It's also less forgiving. If your schemas are vague, your routing is naive, or your safety model is thin, the system will fail in ways that look like "bad reasoning" but are really bad architecture.

References

Documentation & Research

Introducing GPT-5.4 - OpenAI Blog (link)
A gRPC transport for the Model Context Protocol - Google Cloud AI Blog (link)
The Convergence of Schema-Guided Dialogue Systems and the Model Context Protocol - arXiv (link)
HumanMCP: A Human-Like Query Dataset for Evaluating MCP Tool Retrieval Performance - arXiv (link)
DDL2PropBank Agent: Benchmarking Multi-Agent Frameworks' Developer Experience Through a Novel Relational Schema Mapping Task - arXiv (link)

Community Examples 6. ChatGPT apps are about to be the next big distribution channel: Here's how to build one - Lenny's Newsletter (link)

Frequently asked

What is MCP in AI agents?

MCP, or Model Context Protocol, is an open standard for connecting AI assistants to tools, resources, and prompts through a consistent client-server pattern. It reduces one-off integrations and makes tool use more portable across hosts and models.

Why does schema quality matter for MCP?

Because models choose tools from descriptions and input schemas, weak metadata leads to bad selection, wrong parameters, and fragile behavior. Clear descriptions, action boundaries, and failure modes improve reliability.