177,000 Tools, Zero Badges

This companion post provides the ecosystem-scale empirical grounding for The Identity Gap (Series 9). It anchors to How Are AI Agents Used? Evidence from 177,000 MCP Tools (arXiv:2603.23802, Merlin Stein, UK AI Security Institute / University of Oxford, March 2026) — the first systematic census of the MCP tool ecosystem. The companion does not retell Series 9. It provides the empirical reality the series argument lands against. Companion posts extend a series without belonging to it.

Series 9 opened with the Confused Deputy: a deployed agent provisioned with credentials and tool access, manipulated into exercising that authority on behalf of a source it cannot verify. The structural fix is privilege separation and verified workload identity. The question the series left open is one of scale: how much action space has accumulated on top of the identity layer that does not exist?

Merlin Stein’s answer, produced as part of an ongoing collaboration between the UK AI Security Institute and the Bank of England, is 177,436 tools across 19,388 verified MCP servers — tracked from November 2024 to February 2026, classified by direct impact, generality, task domain, and AI co-authorship. The paper does not propose a defense. It measures what is there. What is there is consequential enough to be worth reading carefully.

The Action Space, Measured

The starting point is the classification. Stein distinguishes three types of tools by direct impact: perception tools — those that access and read data; reasoning tools — those that analyse and plan; and action tools — those that directly modify external environments. File editing, sending emails, executing code, steering drones, transferring cryptocurrency: action tools. The distinction matters because an agent limited to perception and reasoning can deceive and mislead but cannot act. An agent with action tools can act.

27% → 65%

Action tools as a share of total monthly MCP server downloads, November 2024 to February 2026. The shift is not gradual — it accelerates. For tools released by registered commercial entities, the move is sharper: 21% to 71% over the same 16-month window. The driver: adoption of general-purpose computer-use and browser-automation tools that operate in unconstrained environments.

The generality dimension compounds this. Stein distinguishes narrow-purpose tools — those that interact with a specific constrained environment like a defined API — from general-purpose tools that operate in unconstrained environments like the open web or a local file system. The critical finding: 94% of general-purpose server downloads involve action capabilities. The tools operating in the least controlled environments are precisely the tools most likely to modify those environments directly.

This is the action space Series 9 addresses architecturally. The confused-deputy problem is not theoretical at 177,000 tools. It is the baseline state of the ecosystem. An agent with a browser automation tool and no verified identity can be manipulated into exercising its provisioned authority across an unconstrained environment — and the attacker does not need to compromise anything to make that happen. The tool is already there. The authorization is already provisioned. The identity layer is not.

The Delegation Chain Has No Receipts at Genesis Either

Series 9 Post 2 documented the delegation verification gap: when an orchestrator delegates to a specialist that calls a tool, the tool cannot verify the authorization chain that produced the invocation. The AIP paper (arXiv:2603.24775) established that 100% of approximately 2,000 surveyed internet-exposed MCP servers lacked authentication. Stein’s paper adds a dimension to this picture that neither paper addresses: the tools themselves are increasingly being built by agents.

6% → 62%

Share of newly created MCP servers with detected first-month AI assistance, January 2025 to February 2026. Claude dominates at 69% of AI-coauthored servers, followed by Cursor (9.2%), Copilot (9.1%), and Codex (6.0%). Across the full dataset: 28% of servers and 36% of tools show AI assistance footprints in commit history.

The structural implication Stein’s discussion section names explicitly: when agents build the tools other agents call, tool creation may scale beyond human oversight. The paper calls this recursive self-improvement — a dynamic where the action space expands without requiring human developer effort at each step. Series 9 Post 2 argued that the delegation chain has no receipts at the runtime invocation layer. Stein’s finding adds the creation layer: the chain has no receipts at tool genesis either.

The invocation gap (Series 9 P2)

No Receipts at Runtime

When an orchestrator delegates to a specialist that calls a tool, the tool receives an OAuth-authenticated request. It cannot verify: who authorized this invocation, through which chain, with what scope constraints, or what the outcome should be recorded as.

AIP’s Invocation-Bound Capability Tokens are designed to close this gap. IETF draft stage. Alpha PyPI implementation available. Not yet in production deployments at scale.

Runtime · Authorization Layer

The creation gap (Stein, arXiv:2603.23802)

No Receipts at Genesis

When agents build the tools other agents call, tool creation no longer bottlenecks on human developer effort. 62% of newly created MCP servers showed AI assistance in February 2026. The tools entering the ecosystem are themselves the products of agent action.

No protocol currently verifies that a tool was created with human authorization or under what scope constraints. The delegation chain is missing a layer that predates any invocation.

Creation · Authorization Layer

Payment Infrastructure: The High-Stakes Illustration

Most action tools in the MCP ecosystem support medium-stakes occupations — computer systems administration, software development, file management. Stein’s consequentiality analysis, using the O*NET occupational impact scale, finds this is the norm. But finance is the exception. High-stakes financial occupations have disproportionately more action tools than the overall cross-occupation distribution predicts. The outlier is measurable and growing.

33×

Growth in MCP servers with payment execution capabilities: 47 servers in January 2025 to 1,578 in February 2026 — a 13-month expansion documented as part of an active monitoring collaboration between the UK AI Security Institute and the Bank of England. Direct payment execution tools — those that transact without requiring external approval — form a meaningful and growing share of this growth.

The Bank of England collaboration is not a footnote. It is an external validation of the monitoring methodology and a signal that financial regulators are treating this as an active concern, not a theoretical one. The paper was explicitly commissioned as part of ongoing governmental foresight work on agentic transaction monitoring. The 1,578 servers represent the publicly visible lower bound. Stein’s paper notes that higher-stakes tools likely exist in private deployments not captured by the MCP registry scan.

The identity architecture argument from Series 9 Post 3 lands here with particular force. Memory isolation assumes the agent crossing a namespace boundary can be identified as unauthorized. Provenance attestation assumes the signing key is bound to a verified identity. Policy enforcement assumes it knows who is making the request. For agents operating payment execution tools in production financial environments, none of those assumptions hold without a verified identity layer. The 1,578 payment servers are operating in the same identity-less ecosystem as the 177,000 tools that surround them.

The Bridge: AIP and Stein in the Same Frame

These two papers — arXiv:2603.24775 and arXiv:2603.23802 — answer different questions, and together they close a frame that neither closes alone.

AIP measured how completely the identity layer is missing. Stein measured how much action space sits on top of it.

— Luminity Digital synthesis, Series 9 Companion

AIP’s ecosystem scan found 100% of approximately 2,000 surveyed MCP servers lacking authentication — a complete absence of the identity mechanism that delegation verification requires. Stein’s census established that 177,000 tools exist on top of that identity-less protocol, with action tools now representing 65% of monthly downloads, 94% of general-purpose server downloads, and 1,578 servers capable of executing payment transactions directly.

The identity gap is not a gap in a small or experimental ecosystem. It is a gap in a production ecosystem that has grown by two orders of magnitude in usage over 16 months, where agents are increasingly operating in unconstrained environments, modifying external systems with real-world consequences, and building the next generation of tools that other agents will call. The Series 9 argument — that identity is the structural foundation every other control assumes — is not an abstract architectural claim. It is a response to a gap that Stein’s census makes concrete.

What the Monitoring Methodology Provides

Beyond the specific findings, Stein’s paper contributes a methodology: systematic monitoring of public MCP repositories as an early-warning layer for agentic deployment patterns. The paper demonstrates that MCP monitoring captured the unofficial Google Calendar MCP server on GitHub in December 2024 — months before Anthropic and OpenAI added official integrations in April and August 2025. The tool registry is an early signal of where agent capabilities are heading before production deployments make it visible in usage data.

The Monitoring Methodology Limit

Stein’s paper is explicit about what the method cannot do. It captures publicly registered tools on developer platforms — it misses private deployments, internal enterprise tooling, and tools built and used on-the-fly by capable agents without being registered anywhere. Download counts proxy developer installation events, not runtime execution. The paper notes this limitation directly: “This method will cease to be useful if AI agents are building their own tools as they need them.” That threshold is not yet reached, but the 6% → 62% first-month AI-assistance trajectory has a clear direction.

The monitoring gap and the identity gap reinforce each other. Tools created by agents for immediate use leave no registry footprint. Actions taken by agents in unconstrained environments leave no authorization record. Both gaps are structural consequences of the same missing layer: a protocol-level mechanism for verified identity and traceable authorization at every step of the agent action chain.

This Post Extends · The Identity Gap — Series 9

Post 1 · Live The Confused Deputy Has No Badge

Post 2 · Live The Delegation Chain Has No Receipts

Post 3 · Live Identity Is the Foundation

177,436 tools Across 19,388 verified MCP servers — Nov 2024 to Feb 2026
27% → 65% Action tool share of monthly downloads over 16 months
21% → 71% Action tool share for registered commercial entities
94% General-purpose server downloads involving action capabilities
47 → 1,578 Payment execution MCP servers, Jan 2025 to Feb 2026
6% → 62% First-month AI-assisted server creation rate, Jan 2025 to Feb 2026
69% Of AI-coauthored servers attributed to Claude
79.3% Of NPM downloads covered by top 1% of servers — 13 servers

Action Tool A tool that directly modifies an external environment — file editing, sending emails, executing code, transferring funds. Distinct from perception tools (read) and reasoning tools (analyze). The category where agent action creates real-world consequence.
General-Purpose Tool A tool operating in an unconstrained environment — browser automation, computer use, arbitrary code execution. 94% of general-purpose server downloads involve action capabilities. High action space + unconstrained environment = highest attack surface.
Recursive Self-Improvement The dynamic where AI agents create tools for other AI agents without requiring human developer effort at each step — eliminating the human bottleneck on tool proliferation. 62% of new servers show AI assistance as of February 2026.
Tool Registry as Early Warning MCP monitoring captures tool availability months before production deployment becomes visible in usage data. Stein demonstrated the method detected the unofficial Google Calendar MCP server five months before Anthropic shipped an official integration.

Series 1 Where Agentic AI Breaks 5 posts · The failure mode map
Series 2 Building Defensible Agents 3 posts · Deterministic architecture
Series 3 The Invisible Attack 3 posts · Indirect prompt injection
Series 4 Fault Lines 3 posts · Hidden structural risks
Series 5 The Policy Layer 4 posts · Governance architecture
Series 6 The Containment Problem Coming soon
Series 7 Memory Under Attack 2 posts + companion
Series 8 The Supply Chain Beneath the Stack 3 posts · Skill supply chain
Series 9 The Identity Gap 3 posts · Agent identity & delegation

The Action Space, Measured

The Delegation Chain Has No Receipts at Genesis Either

No Receipts at Runtime

No Receipts at Genesis

Payment Infrastructure: The High-Stakes Illustration

The Bridge: AIP and Stein in the Same Frame

What the Monitoring Methodology Provides

The identity architecture question is on the near-term Luminity research agenda.

Like this:

Related

177,000 Tools, Zero Badges

The Action Space, Measured

The Delegation Chain Has No Receipts at Genesis Either

No Receipts at Runtime

No Receipts at Genesis

Payment Infrastructure: The High-Stakes Illustration

The Bridge: AIP and Stein in the Same Frame

What the Monitoring Methodology Provides

The identity architecture question is on the near-term Luminity research agenda.

Share this:

Like this:

Related