The Agent Framework Squeeze: Who Owns Orchestration?

After publishing the Context Graph briefing, one question came up more than any other: can Anthropic, OpenAI, and Google just… replace the agent framework layer entirely? The trigger for this conversation was Anthropic’s Programmable Tool Calling, which went GA on February 17, 2026. But it’s not just Anthropic. All three are making the same bet: the platform around the model should own more of the orchestration stack.

It’s a fair question. If you look at the architecture stack from Part 1, agent frameworks sit at the very top — the Consumption layer — alongside MCP servers and search portals. They’re orchestration wrappers sitting between the model and the tools. And the model providers are increasingly saying: we can do that part ourselves.

The trigger for this conversation was Anthropic’s Programmable Tool Calling (PTC), which went GA on February 17, 2026. But it’s not just Anthropic. Google’s Vertex AI Agent Builder with ADK and the Agent2Agent protocol, OpenAI’s visual Agent Builder with AgentKit — all three are making the same bet: the platform around the model should own more of the orchestration stack.

That word — platform — turns out to be doing a lot of work. Let’s be precise about what’s actually happening.

02 What PTC Actually Changes

Programmable Tool Calling is worth understanding in detail because it’s the clearest example of the displacement mechanism. Here’s the core shift.

The Before & After

Before PTC: Claude needs to call five tools. Each call requires a full inference pass — the model reasons about what to call, emits a JSON request, waits for the result, ingests it into context, reasons again, and repeats. Five tools means five inference passes, five rounds of natural-language parsing, and all intermediate results accumulating in the context window. This is exactly the loop-and-route logic that LangChain chains and CrewAI crews were built to manage.

After PTC: Claude writes a Python script that calls all five tools, processes results, applies conditional logic, and returns only the final answer. One execution pass. The intermediate data never enters the context window. Token usage drops 37%. Latency drops by eliminating 4+ inference round-trips.

Anthropic’s Tool Search Tool compounds this further. Instead of loading every tool definition into the context upfront — which could consume 100K+ tokens before a conversation even starts — Claude searches a tool registry and loads only what’s relevant. The reported reduction: 85% fewer tokens on tool definitions alone.

85%

fewer tokens on tool definitions alone. PTC reduces token usage by 37% and eliminates 4+ inference round-trips per multi-tool workflow. Tool Search replaces upfront tool loading with runtime discovery — the model finds what it needs, when it needs it.

A Distinction That Will Matter for the Rest of This Series

It’s tempting to say “the model is absorbing orchestration.” That’s imprecise. The model is stateless. Every API call starts from zero — Claude doesn’t remember the last tool it called, the last result it parsed, or the last workflow it ran. What PTC actually does is give Claude the ability to write orchestration code. That code then executes in Anthropic’s sandbox — a platform-layer container that is separate from the model itself. The model contributes intelligence. The platform contributes execution. These are different things, owned by different layers, with different competitive implications.

If the model were absorbing orchestration into its own cognition — learning and retaining orchestration patterns — that would be a deep moat. No one could replicate what’s inside the model’s weights. But the model is writing code that runs outside itself in a sandbox. That means the competitive advantage lives in the platform (who has the best sandbox, the lowest latency, the deepest tool integration) rather than in the model’s intelligence per se. And platforms can be competed with.

Now read PTC through the lens of our Context Graph stack. The Consumption layer had three blocks: MCP Servers, Search & Discovery, and Agent Frameworks. PTC is Anthropic’s platform absorbing the third block’s core function — tool orchestration — by having the model write the orchestration code and executing it in a managed sandbox. Tool Search is the platform absorbing a piece of the second block — discovery — as well.

Google and OpenAI are approaching the same destination from different angles. Google’s ADK wraps LangChain, LangGraph, and CrewAI agents into a managed Vertex AI runtime, and uses the A2A protocol to make framework choice less significant. OpenAI’s Agent Builder moves orchestration to a visual canvas. Both strategies commoditize the framework: if you can swap between them without changing the hosting layer, no individual framework has pricing power.

In all three cases, the displacement comes from the platform infrastructure around the model — the sandboxes, the managed runtimes, the hosting layers — not from the model’s own intelligence. The model is the reason these platforms are valuable, but it’s the platform doing the absorbing.

03 What Gets Displaced vs. What Survives

This is where the conversation gets nuanced. “Agent framework” isn’t one thing — it’s a bundle of capabilities. The provider platforms are eating some of them and ignoring others.

Being Displaced

Sequential tool orchestration

Call tool A → parse result → decide → call tool B → loop. This was the original LangChain chain pattern. PTC does it in a single code block — the model writes the logic, the platform’s sandbox executes it — faster and with fewer tokens. The framework’s orchestration loop is replaced by generated code running in a managed container.

Being Displaced

Tool discovery & loading

Frameworks maintained registries of available tools and loaded their schemas into prompts. Tool Search replaces this with a searchable index. The model discovers tools at runtime rather than pre-loading everything. Again: the intelligence is the model’s, but the index and the retrieval infrastructure are the platform’s.

Still Has Runway

Stateful, long-running workflows

LangGraph’s key differentiator: an agent can pause mid-workflow, wait for human approval for days, and resume with full state preserved by the framework. This requires state to live outside the model — precisely because the model is stateless. Provider platforms are adding session persistence, but these aren’t workflow state machines with branching, checkpointing, and human-in-the-loop gates. Not yet.

Still Has Runway

Multi-agent coordination

CrewAI’s role-based delegation — one agent researches, another analyzes, a third writes — requires inter-agent communication and shared working context that PTC doesn’t address. PTC gives a single model instance the ability to write sophisticated orchestration code. Crews coordinate multiple model instances with distinct roles, passing structured context between them. These are structurally different patterns.

The pattern: anything that’s essentially “smart glue” between a model and its tools is getting absorbed into the provider’s platform. Anything that’s workflow state management, multi-agent coordination, or human-in-the-loop orchestration still requires an external framework — because it requires persistent state that the stateless model can’t hold.

This is where the model’s statelessness actually protects the surviving frameworks. A framework like LangGraph doesn’t compete with the model’s intelligence — it competes with the model provider’s platform infrastructure. And platform infrastructure is a more level playing field than model intelligence. You can build a workflow state machine without training a frontier model. You can’t write a better PTC without one.

That said, the boundary is moving. Provider platforms are steadily adding persistence primitives — persistent containers, session state, memory tools. The “still has runway” column is shrinking. The question is how fast, and whether the frameworks can move up the value chain faster than the platforms extend downward.

04 Reading the Stack Diagram Differently

Here’s where our Part 1 architecture becomes a useful strategic map. Let’s overlay the displacement dynamics onto the stack — with the precision that the displacement comes from provider platforms, not from the model itself.

Context Graph Stack — Displacement Pressure by Layer

Consumption

Agent Frameworks — under direct pressurePlatform sandboxes (PTC, Agent Builder) absorb core orchestration

MCP Servers + Discovery — co-optedProvider platforms adopt MCP; tool search built into platform layer

▼ pressure increasing downward ▼

Activation

Governance & Automation — untouched so farPolicy enforcement, access controls remain with metadata platforms

Decision Orchestration — still missing for everyoneCross-system routing based on context queries: nobody has this yet

▼ ▼ ▼

Context Store

Metadata Lakehouse + Glossary — safe for nowAtlan, Alation, Collibra territory; provider platforms don’t store enterprise metadata

Graph Model

Data Asset Graph + Lineage — safe for nowRequires deep connector coverage; not a provider platform competency

Key insight: the model is stateless. It can’t hold state across the layers above.
The displacement pressure comes from provider PLATFORMS — the infrastructure wrapped around the model.
This means the lower layers are safe not because providers don’t want them,
but because building persistent, cross-system infrastructure is a different business than running models.

Actively contested

Being co-opted by provider platforms

Safe from provider displacement

Unbuilt — up for grabs

The displacement pressure is top-down, and it’s driven by platform expansion, not model capability. The model providers are absorbing the Consumption layer first because that’s where their platform infrastructure naturally extends — running code in sandboxes, hosting tool registries, managing API routing. The model’s intelligence makes it possible; the platform’s infrastructure makes it real.

The deeper question — and the one we take up in Part 3 — is whether the platforms keep pushing downward. They’re already adding persistence primitives: OpenAI’s Conversations API with persistent containers, Anthropic’s Memory Tool with cross-session file storage, Google’s Sessions and Memory Bank. Each of these is a platform-layer capability that moves toward the Activation and Context Store layers.

But here’s the nuance that changes the competitive calculus: because the model is stateless, these persistence primitives are just infrastructure. They’re files written to storage and re-injected into context windows. And that kind of infrastructure can be built by anyone — not just the model provider. A metadata platform that sits in the MCP request path can capture the same information. A middleware layer between the enterprise and the API can intercept the same tool calls. The model doesn’t know or care where its context came from.

There’s a second nuance the framework displacement debate tends to obscure: the lower layers of the stack aren’t primarily about agent activity at all. The Operational Context layer — decision traces, institutional knowledge, exception precedent — is overwhelmingly about human decisions. The VP who approved the pricing exception did it in an email. The unwritten rule about how escalations actually get routed lives in a tenured employee’s head. Agents will generate clean, structured decision traces as a byproduct of execution, but today’s trillion-dollar gap exists because decades of human decisions were never captured. Ironically, LLMs themselves may be the best tool for closing that gap — they’re uniquely capable of extracting structured decision context from unstructured email threads, Slack messages, and approval workflows. But the model provider’s platform proximity doesn’t help here. The extraction capability is model-agnostic. The durable advantage belongs to whoever has the connectors to feed the model and the graph infrastructure to store what it extracts.

That’s a fundamentally different competitive dynamic than “the model absorbs your functionality.” It’s “the platform around the model is competing with your platform, and both platforms are providing context to the same stateless model.” The playing field is more level than it first appears — and it extends well beyond the agent orchestration layer this post has focused on.

05 What the Frameworks Are Doing About It

The surviving frameworks aren’t standing still. The responses map to a clear pattern: move up the value chain — toward state management and coordination — or get absorbed into the platform layer.

LangGraph

Pivoted from chains to workflow state machines. LangChain’s own team publicly recommends LangGraph for agents, acknowledging that simple chain orchestration is commoditized. LangGraph’s bet is that stateful, pausable, human-in-the-loop workflows are structurally different from what provider platforms offer — precisely because they require persistent state the model itself can’t hold. LinkedIn and Uber run it in production. The risk: provider platforms could add workflow state management to their persistence layers, narrowing LangGraph’s advantage to implementation maturity rather than structural differentiation.

CrewAI

Doubled down on multi-agent role coordination. CrewAI raised $18M and reports adoption across 60% of Fortune 500 companies. Their thesis: a single model instance writing orchestration code is fundamentally different from multiple model instances collaborating through structured role delegation with shared working context. The risk: as provider platforms get better at managing multi-session coordination (OpenAI’s Conversations API already supports durable threads), the line between “multi-agent framework” and “multi-session platform feature” blurs.

Microsoft

Merged everything into one framework. AutoGen and Semantic Kernel became the unified Microsoft Agent Framework, GA in Q1 2026. For Azure-heavy enterprises, this is the path. Microsoft’s structural advantage: they control the model partnership (via OpenAI), the hosting platform (Azure), the identity layer (Entra), and the enterprise applications (365, Dynamics). That’s the closest thing to a full vertical stack today — and notably, it’s a platform story, not a model story.

Google

Embraced and extended. Rather than compete with frameworks, Vertex AI Agent Builder hosts them — LangChain, LangGraph, CrewAI, AG2 all run on Vertex. Google’s play is to make the framework layer interchangeable while locking in the hosting and evaluation layer above it. The A2A protocol lets agents built on different frameworks communicate regardless of provider. Google is explicitly commoditizing frameworks to capture value at the platform layer — which is a clean read of where the real leverage lives.

Where This Leaves Us

Being Commoditized

Thin Orchestration Wrappers

Agent frameworks as thin orchestration wrappers are being commoditized. That’s not a prediction — it’s happening now, and PTC is the most concrete evidence. If your agent’s value comes from calling three tools in sequence and parsing the results, the model can write that logic and the provider’s sandbox can execute it — natively, faster, and cheaper.

Shrinking Runway

Still Differentiated

Workflow State & Multi-Agent Coordination

Agent frameworks as workflow state machines and multi-agent coordinators still have meaningful runway. The capabilities they provide — persistent workflow state, role-based delegation, human-in-the-loop checkpoints — require state that the model itself cannot hold. They’re competing with the provider’s platform, not the provider’s model, and that’s a more survivable fight.

Structural Advantage

The precision we’ve tried to maintain throughout this post — the model vs. the platform around the model — sets up the more interesting question we take on in Part 3: can provider platforms push all the way down into the Activation and Context Store layers?

On the surface, it looks like they’re assembling the pieces: persistent memory, persistent compute, session state. But there’s an architectural fact that reshapes the competitive dynamics entirely. The model is stateless. Every call starts from zero. The “memory” that providers are shipping is just files written to storage and re-injected into context windows — and that mechanism can be built by anyone who sits in the request path. The provider has proximity. But proximity is not a lock.

The Broader Point This Debate Tends to Miss

The framework squeeze is only about the agent-mediated slice of the context graph. Most organizational decisions today are still made by humans — in email threads, Slack channels, approval workflows, and hallway conversations. That context was never captured, never structured, and never made queryable. The trillion-dollar gap exists right now, before a single agent is deployed. The connectors to reach that human context already exist, and LLMs are capable of extracting structured decision records from the unstructured mess — but nobody has integrated the full pipeline into a product. The vendors best positioned to do so aren’t model providers or agent frameworks — they’re the platforms with connector depth, graph infrastructure, and the ability to use any LLM as the extraction engine. The framework displacement we’ve analyzed here is real and consequential. But it’s playing out in the smaller half of the opportunity.

That’s where Part 3 goes — and the implications for metadata platforms, for the VC thesis, and for who actually builds the context graph are more nuanced than either the bulls or the bears expected.

Next — Part 3

The Downward Push: Can Model Providers Become Context Platforms?

We examine the persistence primitives that Anthropic, OpenAI, and Google have already shipped — and confront two architectural realities. First, the model’s statelessness democratizes the context capture opportunity more than anyone anticipated. Second, the most valuable context to capture isn’t agent decision traces — it’s the vast backlog of human decisions that were never structured or stored. The twist: LLMs themselves may be the best extraction engine for that human context, even as no single model provider is positioned to own the full pipeline. What does this mean for metadata platforms, for the VC ecosystem funding the “context graph” category, and for who actually ends up owning the operational context half of the trillion-dollar gap?