When a user adds a skill, plugin, or MCP-connected tool to an AI agent, a quiet assumption takes hold: the description they read is a faithful representation of what the code will do. That assumption is wrong — structurally, systematically, and by design of current protocols. Skills execute with implicit user-level trust and no runtime capability confinement. Every adversary type exploiting agentic AI today is, at root, exploiting this same condition. This piece names that condition the Consent Gap, maps the adversary behaviors that exploit it, and anchors the analysis in current research and authoritative frameworks.
In traditional software, a permission prompt tells users what an application will access — the camera, the contacts list, the filesystem. The user reads the prompt and makes a decision. The operating system enforces a boundary between the described permission and the actual system call. Consent has a technical correlate: the permission model enforces it. In current agentic AI architectures, this correlation breaks down entirely.
What Consent Actually Means in Agentic Systems
LLM plugins are extensions that, when enabled, are called automatically by the model during user interactions — driven by the model, with no application control over execution. The “description” field of an MCP tool is not a security boundary. It is plain text metadata that an LLM incorporates into its reasoning context. There is no mechanism that validates whether the code behind a description matches what the description says.
There is no built-in mechanism for MCP Clients or users to cryptographically verify the true origin or authenticity of a tool. Tool names, descriptions, and even claimed provider names within the tool’s metadata can be easily spoofed. A tool can assert claims like “secure,” “official,” or “privacy-preserving” in its human-readable description without any underlying mechanism to validate these assertions.
The user consents to a description. The agent executes code. Nothing in between enforces their correspondence.
— The Consent Gap, defined structurallyThe Trust Model at Installation Time
When a user installs an MCP server or agentic skill, the security model in place today asks a deceptively simple question: does the description look legitimate? There is no sandboxing of skill capabilities. There is no static analysis of the code against its description. There is no cryptographic attestation that the description and the executable payload were authored together and have not since diverged.
MCP’s security model assumes that tool descriptions are trustworthy and benign. That assumption is not reinforced by any technical control. Once the user approves a skill, the agent treats its tool descriptions as ground-truth representations of what that skill does. When the agent retrieves and parses this metadata to inform its planning, it processes the entire tool description as a definitive representation of the tool’s capabilities — including any hidden instructions embedded within it.
Attack success rate achieved against the o1-mini model in the MCPTox benchmark — the first large-scale empirical evaluation of tool poisoning across 45 live MCP servers and 353 authentic tools. More capable models proved more susceptible, as the attack exploits their superior instruction-following abilities. The highest refusal rate observed was less than 3%, by Claude 3.7 Sonnet. — Wang et al., arXiv:2508.14925, August 2025
The user’s approval is, in effect, a blank check. The skill executes at the trust level of the user’s session, with access to whatever data and tools the agent environment exposes. Inadequate access control allows a plugin to blindly trust other plugins and assume that the end user provided the inputs — enabling malicious inputs to have harmful consequences ranging from data exfiltration, remote code execution, and privilege escalation.
The Adversary Taxonomy: Four Exploitation Paths, One Structural Condition
What follows is not four separate problems. It is four adversary strategies converging on the same architectural weakness. The Consent Gap is not exploited differently by different adversary types — it is exploited by the same mechanism, from different directions and with different timelines.
Malicious Skill Author
A threat actor crafts a skill whose public-facing description presents a plausible, useful function — a file organizer, a currency converter, a code linter. The underlying tool description, however, contains instructions invisible to human users but fully readable by the LLM. In a tool poisoning attack, the attacker hides malicious instructions within the metadata of the tool, making it invisible to the human user.
Invariant Labs, April 2025 · OWASP LLM06:2025Intent From the Start
Invariant Labs demonstrated that a malicious MCP server can instruct the agent to exfiltrate SSH keys and configuration files — including credentials for all other connected MCP servers — without the user observing anomalous behavior. The mcp.json file specifically stores credentials for other MCP servers and entire workflow platforms. Once exfiltrated, the attacker has lateral access to every connected service.
Invariant Labs Disclosure · invariantlabs.ai, April 2025Supply Chain Compromise
Not every malicious skill begins malicious. Supply chain compromise introduces a second pathway: a legitimate, popular skill is targeted after adoption. Users and organizations that rely on community reputation, download counts, or prior audits are not protected — the trust heuristics they rely on are defeated the moment the upstream package is compromised.
MITRE ATLAS AML.CS0048 · OpenClaw Investigation, February 2026OpenClaw / ClawdHub
MITRE ATLAS documented a proof-of-concept supply chain attack using a poisoned skill shared on ClawdHub, a package registry for skill distribution. The skill was poisoned through a malicious prompt hidden within the skill payload that, when executed, allowed for arbitrary code execution on the user’s system. MITRE assessed all identified techniques as mature — demonstrated or realized elsewhere in the wild.
MITRE ATLAS PR-26-00176-1 · February 2026The Rug Pull
The rug pull exploits the one-time nature of user consent. A skill behaves correctly for days, weeks, or months, building genuine trust and embedding itself in critical workflows. Then its behavior changes — silently, server-side, without triggering any re-authorization. The MCP specification permits servers to update tool definitions, but client implementations rarely surface these changes to users.
Invariant Labs · Elastic Security Labs, September 2025Silent Redefinition
Elastic Security Labs illustrated the financial dimension: a tool called daily_quote returning an inspirational message could, after a rug pull, embed a hidden instruction that any invocation of a co-installed transaction_processor tool silently routes a percentage fee to an attacker-controlled account — with no logging and no user notification. A tool doesn’t need to be called to affect another tool’s behavior.
Elastic Security Labs · elastic.co, September 2025Cross-Tool Shadowing
The most architecturally subtle path does not require a malicious tool to be called. It requires only that a malicious tool be present. Because the LLM uses all available tool descriptions as part of its reasoning, a single poisoned description can alter how the agent interprets and invokes every other tool in the ecosystem. This is a qualitative departure from conventional malware threat models.
Invariant Labs · CyberArk Research, December 2025Pollution Without Execution
A malicious server can poison tool descriptions to exfiltrate data accessible through other trusted servers — making authentication hijacking possible. Combined with a rug pull, a malicious server can hijack an agent without ever appearing in the user-facing interaction log. CyberArk research extended this: every field in the tool schema — not just the description — is processed by the LLM as part of its reasoning loop, making the entire schema a potential injection point.
CyberArk Threat Research · cyberark.com, December 2025Research Foundations
The research community has moved quickly to characterize and measure these risks. The following works represent the current evidentiary baseline — moving the field from theoretical concern to empirical measurement and proposed architectural remediation.
MCPTox Benchmark — Wang et al. (arXiv:2508.14925, August 2025)
The first large-scale empirical evaluation of tool poisoning vulnerability across real-world MCP infrastructure. Evaluated 20 prominent LLM agents against 1,312 attack cases derived from 45 live MCP servers and 353 authentic tools. Key finding: attack success rates exceed 40% for multiple frontier models, and the highest-capability models are frequently most vulnerable because the attack exploits instruction-following fidelity. Agents rarely refuse these attacks; the highest refusal rate was less than 3%.
Available: arxiv.org/abs/2508.14925
Enhanced Tool Definition Interface / ETDI (arXiv:2506.01333, June 2025)
Proposes a formal security extension to MCP incorporating cryptographic identity verification, immutable versioned tool definitions, and OAuth 2.0-based explicit permission management. Introduces policy-based access control where tool capabilities are dynamically evaluated against declared policies at runtime — directly addressing the structural conditions underlying both tool poisoning and rug pull attacks. The most architecturally complete response to the Consent Gap currently in the literature.
Available: arxiv.org/abs/2506.01333
MITRE ATLAS OpenClaw Investigation (PR-26-00176-1, February 2026)
The first MITRE ATLAS case study series focused specifically on agentic skill frameworks at scale. Identified seven novel techniques unique to OpenClaw, including supply chain compromise via poisoned skills (AML.CS0048), one-click RCE via crafted URLs (AML.CS0050 / CVE-2026-25253), and privilege escalation through agent configuration modification. All identified techniques assessed as mature — demonstrated or realized elsewhere in the wild. Establishes that skill-based attack paths are no longer theoretical.
Available: atlas.mitre.org / PR-26-00176-1
Systematic Analysis of MCP Security (arXiv:2508.12538, August 2025)
Provides a comprehensive taxonomy of MCP-specific attack classes including direct and indirect prompt injection, tool poisoning, rug pull, cross-server shadowing, remote listener attacks, and server spoofing. Establishes formal definitions grounding the field’s emerging vocabulary, and traces each attack class to the structural conditions in the MCP specification that enable it.
Available: arxiv.org/abs/2508.12538
Additional Research Foundations
Invariant Labs Tool Poisoning Disclosure (April 2025) — Original public disclosure coining “tool poisoning,” demonstrating proof-of-concept exfiltration of SSH keys, mcp.json credentials, and multi-server authentication hijacking. Introduced the formal distinction between tool poisoning and rug pull as separate but combinable attack classes.
MCP Safety Audit (Radosevich & Halloran, arXiv:2504.03767, 2025) — Independent audit of real-world MCP deployments demonstrating that across common LLM-MCP configurations, the combination of pre-authorized tools and aggressive instruction-following produces reliably exploitable conditions.
OWASP Top 10 for LLM Applications 2025 (LLM01 — Prompt Injection; LLM06 — Excessive Agency) — The authoritative industry classification framework. Excessive Agency formally addresses conditions where LLMs are granted more functionality, permissions, or autonomy than their tasks require. Insecure Plugin Design has been consolidated into this category in the 2025 edition, reflecting its status as a systemic condition rather than an isolated vulnerability class.
MITRE ATLAS Framework Update (October 2025, with Zenity Labs) — Added 14 new agentic AI attack techniques and sub-techniques, including AML.T0062 (Exfiltration via AI Agent Tool Invocation). As of October 2025, the framework contains 15 tactics, 66 techniques, 46 sub-techniques, and 33 real-world case studies specifically targeting AI and machine learning systems.
Why the Gap Persists
The Consent Gap is not an oversight waiting to be patched. It reflects a design inheritance from traditional software plugin ecosystems that was never appropriate for LLM-native architectures. In a traditional plugin model, the plugin executes its own code in a constrained environment. The capability boundary is enforced by the operating system, not by the plugin’s documentation.
In an MCP-native agentic architecture, the skill does not operate in a sandbox. Its tool descriptions flow directly into the LLM’s reasoning context, where they are processed with the same epistemic weight as user instructions and system prompts. Prompt injection ranks as the number one vulnerability in the OWASP Top 10 for LLM Applications 2025. Even the official MCP specification acknowledges the risk, noting that there should always be a human in the loop with the ability to deny tool invocations — language that is advisory, not enforceable.
Share of organizations with complete visibility across agent behaviors, permissions, tool usage, and data access — described by security leaders as a deeply concerning blind spot. Only 38% monitor agent decisions end-to-end when instructions chain across systems. One in five organizations acknowledge they have deployed agents with no guardrails or monitoring at all. — Akto State of Agentic AI Security 2025
OS-Enforced Capability Boundary
The plugin executes its own code in a constrained environment. The operating system enforces separation between the described permission and the actual system call. A camera permission grant is technically enforced — the OS mediates all camera access and can deny it.
The description is informational. The boundary is architectural. The user’s consent decision has a direct technical correlate in system behavior.
Boundary Enforced by Host OSDescription-Driven Implicit Trust
The skill’s tool description flows directly into the LLM’s reasoning context. It is processed with the same epistemic weight as user instructions and system prompts. There is no OS-layer enforcement of what the description claims the skill will do.
The description is executable context. The user’s consent decision has no technical correlate that validates code-description correspondence. The agent acts on description; the description can lie.
Description ≠ Enforced BoundaryWhat Closing the Gap Requires
Closing the Consent Gap is not primarily an end-user education problem. It is an architectural problem. The controls that would address it operate at a layer that individual users cannot see or influence. Five technical and governance controls define the remediation architecture.
Cryptographic Attestation of Tool Identity
The ETDI framework proposes that tool definitions be cryptographically signed so that clients can verify a tool’s origin and detect any post-publication modification. This converts rug pull detection from a behavioral inference problem into a verification problem. Without cryptographic attestation, any claims in a tool’s metadata — including authorship, capabilities, and scope — are unverifiable assertions.
Runtime Capability Confinement
Skills should not inherit the full trust context of the user session. Capability declarations in tool metadata should constrain what a skill can access at runtime — enforced by the host, not stated by the skill author. This is the architectural analog of OS-level permission enforcement: the description becomes a binding contract, not an advisory label.
Continuous Behavioral Monitoring
Rug pull attacks break the model’s assumption that tool behavior is consistent and predictable, enabling silent execution of malicious logic. Effective defense requires continuously monitoring tool behavior for consistency — tracking each MCP server’s response patterns across sessions and flagging any drift from declared capabilities or schema. A tool approved once must remain verified continuously.
Mandatory Re-Authorization on Definition Change
Description mutations should trigger a re-consent flow, not silent absorption into the agent’s context. The one-time approval model is structurally incompatible with post-installation attack surfaces. Version-pinning, hash verification, and description-change alerting are the minimum viable controls for any organization operating skills at production scale.
Human-in-the-Loop for Consequential Actions
This is the explicit guidance of the MCP specification itself, and the control most consistently absent in enterprise deployments. The principle of least privilege — requiring human approval for sensitive operations — limits the data available to exfiltration and constrains the blast radius of a compromised skill. Autonomy without consequence boundaries is not a feature; it is an unmanaged attack surface.
The enterprise deployment picture makes the urgency clear. Sixty percent of organizations have not conducted a formal agentic or AI security risk assessment in the last twelve months. Only a small minority report having a fully maintained, up-to-date inventory of AI agents and MCP connections, and the majority lack governance policy for agent permissions and monitoring. Agentic AI is being adopted far faster than security teams can assess or secure the risks.
If your agentic system has no cryptographic verification of skill identity, no runtime capability confinement, no behavioral monitoring for post-installation drift, and no re-authorization trigger on tool definition changes — you have not consented to what your skills do. You have consented to what they said they would do, at a point in time that may no longer be relevant. The Consent Gap is not a gap in user awareness. It is a gap in architecture. Closing it requires controls that operate below the layer of user decisions.
Conclusion
The Consent Gap is the defining structural vulnerability of current agentic AI skill ecosystems. It is not a bug in any single product. It is a condition created by the architectural inheritance of pre-LLM plugin models applied to systems where text is both the interface and the attack surface. Every adversary type operating in this space — malicious skill authors, supply chain attackers, rug pull operators, cross-tool shadowing campaigns — is exploiting the same thing.
The research record is now sufficient to characterize the risk quantitatively, model the adversary behavior systematically, and describe the technical controls that would close the gap. The MCPTox benchmark has quantified vulnerability at scale. The ETDI paper has proposed the cryptographic architecture to remediate it. MITRE ATLAS has documented real-world skill framework attack chains. OWASP has classified the underlying vulnerability conditions. What remains is deployment — of frameworks, of tooling, and of governance postures that treat skill installation as a security decision, not a convenience operation.
