The Confused Deputy Has No Badge — Luminity Digital
Series 9 · The Identity Gap · Post 1 of 3
Agentic AI Security  ·  Identity & Trust

The Confused Deputy Has No Badge

Agents act with authority they cannot prove they hold. When any component in the architecture can impersonate any other — and no mechanism exists to verify the claim — authorization is a text string, not an enforced boundary. The confused deputy attack is not an edge case. It is the default state of most enterprise agentic deployments.

April 2026 Tom M. Gomez Luminity Digital 13 Min Read
Series 8 closed with a precise limitation: the provenance architecture that closes the supply chain gap depends on identity verification to anchor it. An adaptive attacker with a valid signing key can forge attestation chains. The key insight from RAGShield (arXiv:2604.00387) is that provenance and identity are not redundant controls — they are complementary ones, and neither alone is sufficient. Series 9 begins here. The anchor paper for this post is Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection (arXiv:2603.13424, Cheng and Tsao, March 2026), which establishes the architectural mechanism that addresses the trust assumption at its root: separate the component that processes untrusted content from the component that takes action, and the impersonation surface collapses.

The confused deputy problem is one of the oldest problems in computer security. It was named in 1988 by Norm Hardy, who described a compiler program that had been granted privileges its users had not — meaning that when a user persuaded the compiler to do something privileged, the compiler was acting as a “confused deputy,” using its elevated authority on behalf of a request it had no valid authorization to honor. The fix, Hardy argued, was not better detection of malicious requests. It was separating the privileges that different components hold, so that no component can be manipulated into exercising authority beyond what its specific role requires.

In agentic AI systems, the confused deputy problem has re-emerged with structural force. An agent is provisioned with credentials, tool access, and permissions that its users did not individually authorize for each action the agent takes. When that agent processes an untrusted document, tool output, or memory entry — and that content instructs the agent to perform an action using its provisioned authority — the agent is acting as a confused deputy: exercising privileges on behalf of an instruction source that was never authorized to invoke those privileges.

The problem is compounded by an absence that makes the classic confused deputy scenario look tractable by comparison. In traditional software security, at least the program’s identity was not in question — the compiler was unambiguously the compiler. In agentic systems, agent identity itself is unverified. When an orchestrator agent delegates a sub-task to a sub-agent, the sub-agent cannot verify that the delegation originated from a legitimate orchestrator acting on valid user intent. When a tool receives an invocation, it cannot verify that the invoking agent is authorized to make that specific request in that specific context. Identity is asserted in text. Text can be forged.

The Authentication Gap in Numbers

The absence of cryptographic identity is not a future design gap — it is a present-state reality documented across multiple independent audits of the MCP ecosystem in early 2026.

~38%

Of internet-exposed MCP servers lack authentication at the protocol level, per Adversa AI’s scan of 500+ servers. A separate scan of approximately 2,000 internet-exposed MCP servers found all lacked authentication — a finding corroborated by independent reporting in Dark Reading, which identified 1,862 exposed servers and found none requiring authentication for a basic tool enumeration request. The gap between deployment velocity and identity infrastructure is measurable and growing.

100%

Inter-agent trust exploitation success rate documented in agentic system security research. Models that successfully resist direct prompt injection — refusing identical requests when they come from human users — immediately capitulate when the same requests originate from a peer agent. The architecture implicitly encodes an elevated trust level for agent-to-agent communication that has no cryptographic basis. Any component claiming to be a trusted orchestrator is treated as one.

The 100% inter-agent success rate is the most operationally significant finding in this space. It establishes that the safety training that gives frontier models meaningful resistance to direct prompt injection does not extend to agent-sourced instructions. This is not a failure of safety training design for human-AI interactions — it is a gap in the threat model. Safety training addresses the human-AI trust boundary. It does not address the AI-AI trust boundary, because that boundary was not the primary design concern when most current models were trained.

The Privilege Escalation Consequence

Research specifically examining privilege usage in agentic systems with real-world MCP tools (GrantBox, arXiv:2603.28166) found that ReAct agents exhibited an average attack success rate of 90.55% under sophisticated manipulation — and Plan-and-Execute agents showed 79.05% ASR. Current LLMs possess only foundational security awareness. They can occasionally identify overtly illegal requests. They remain highly vulnerable to sophisticated privilege abuse that works through the normal task execution flow rather than obvious policy violations. The privilege model that makes agents productive is the same model that makes privilege escalation structurally accessible.

The Structural Diagnosis: Why the Trust Boundary Is Absent

The Luminity corpus’s central taxonomic distinction — deterministic architectural enforcement vs. probabilistic behavioral controls — applies with particular force to the identity problem. Every approach that addresses agent trust boundaries through behavioral means — safety training, guardrails, content filtering for suspicious delegation claims — is probabilistic. It works until an attacker finds the phrasing or context that bypasses it. The 100% inter-agent exploitation success rate is the empirical proof that the current generation of probabilistic controls does not hold at the agent-to-agent boundary.

The architectural root cause is the command-data boundary collapse that Series 1 of this publication established as the foundational failure mode of agentic AI. In traditional software, instructions and data occupy structurally separate spaces — the processor’s instruction pointer cannot be redirected by data unless an exploit has been deployed. In LLM agents, instructions and data flow through the same token stream, processed by the same attention mechanism, evaluated for action potential by the same inference pass. There is no structural mechanism at the model level that makes a delegation claim from a peer agent verifiable vs. forgeable — because the verification would require the model to inspect metadata that is not present in the token stream.

What Privilege Separation Provides

Cheng and Tsao’s Agent Privilege Separation research (arXiv:2603.13424) establishes the architectural mechanism that addresses this at the structural layer. The approach is a privilege-separated two-agent pipeline with tool partitioning, implemented in the OpenClaw agent platform and evaluated against Microsoft’s LLMail-Inject benchmark — 649 attacks that had succeeded against a single-agent baseline.

Privilege-Separated Two-Agent Pipeline — Architecture

The pipeline separates agent operation into two distinct roles with distinct privilege sets. The quarantine agent processes all untrusted content — retrieved documents, tool outputs, external data, user-provided materials. It operates with restricted privileges: it can process and summarize content, but it cannot invoke high-privilege tools or take external actions. The action agent selects and executes actions based on structured outputs from the quarantine agent. It holds the privileged tool access but receives only structured, formatted outputs from the quarantine agent — not raw untrusted content.

The key architectural property: untrusted content never reaches the component with privileged tool access. An injection payload embedded in a retrieved document can only influence the quarantine agent’s outputs, which are stripped of persuasive framing before the action agent processes them. The path from untrusted content to privileged action requires passing through a structural boundary that the injection payload cannot cross.

0%

Attack success rate for the full privilege-separated pipeline against 649 attacks that had succeeded against the single-agent baseline on the LLMail-Inject benchmark. Agent isolation alone achieves 0.31% ASR — 323 times lower than the baseline. JSON structured output formatting alone achieves 14.18% ASR. Ablation confirms agent isolation is the dominant mechanism. Structural privilege separation, not content filtering, is what closes the attack surface.

The 323× improvement from agent isolation alone is the result that matters most for the enterprise practitioner. It is not a 10% improvement or a 50% improvement — it is a near-complete collapse of the attack surface. The residual 0.31% represents edge cases where the structured output boundary is insufficiently enforced. Adding JSON formatting to strip persuasive framing from quarantine agent outputs closes that residual to 0%.

What Privilege Separation Does Not Solve

The privilege-separated pipeline is a structural defense against a specific attack class: injection through untrusted content that reaches a privileged agent. It addresses the case where an external document or tool output contains a malicious instruction that attempts to redirect the action agent’s behavior. This is a major attack surface and the research demonstrates that it is closable through architectural means.

What the Flat Single-Agent Architecture Exposes

All Content Reaches the Privileged Layer

In a single-agent architecture, the agent processes retrieved documents, tool outputs, user inputs, and peer agent messages in the same context window and evaluates them all for action potential using the same inference pass. An injection payload in any of these sources can potentially redirect tool selection and action execution.

The 100% inter-agent exploitation success rate applies here: requests from peer agents claiming orchestrator authority are indistinguishable from legitimate orchestrator requests at the model layer. Safety training provides partial protection against obvious violations; it provides no protection against well-framed impersonation.

All Trust Levels — Same Layer
What Privilege Separation Provides

Structural Trust Tier Enforcement

The privilege-separated pipeline enforces a structural boundary: untrusted content is processed by the quarantine agent with restricted privileges, and only structured outputs cross the boundary to the action agent. An injection payload in retrieved content cannot reach the action agent because the boundary is architectural, not probabilistic.

What it does not close: the inter-agent trust boundary. An orchestrator agent that issues delegation instructions to a sub-agent is still asserting its identity in text. The sub-agent cannot cryptographically verify that the delegating agent is a legitimate orchestrator acting on valid user authority. Privilege separation structures the processing pipeline — it does not verify who is issuing instructions to whom.

Processing-Layer · Trust-Tier Enforced

The right column’s limitation is the bridge to Posts 2 and 3 of this series. Privilege separation is a necessary architectural control for the injection attack surface. It is not sufficient for the delegation verification problem — the question of whether an agent issuing an instruction is cryptographically authorized to do so. That requires a different mechanism: verifiable delegation chains that propagate user authorization through the task graph with receipts at each step. The AIP research and the SPIFFE workload identity standards address exactly this gap.

Privilege separation structures the processing pipeline. It does not verify who is at the other end of the pipeline. Those are different problems — and the second one has no architectural solution deployed at scale anywhere in the MCP ecosystem today.

— Luminity Digital synthesis from arXiv:2603.13424, Cheng and Tsao, March 2026
The Central Insight

The confused deputy attack in agentic AI is not a failure of safety training — it is a consequence of deploying agents without the identity infrastructure that makes authorization meaningful. Privilege separation closes the injection attack surface: 0% ASR with a two-agent pipeline against 649 attacks. It does not close the delegation verification gap. Every other structural control in this corpus — memory isolation, provenance attestation, behavioral monitoring — assumes that agents are who they claim to be. Post 2 examines what it takes to make that assumption true.

Up Next: Post 2 — The Delegation Chain Has No Receipts

When a human authorizes an agent to act, that authorization must propagate through the task graph with cryptographic proof at each step. Current systems do not do this. Post 2 examines what the Agent Identity Protocol, SPIFFE workload identity, and verifiable delegation chains provide — and why OAuth alone does not solve the problem.

The Identity Gap  ·  Three-Part Series
Post 1 · Now Reading The Confused Deputy Has No Badge
References & Sources

Share this:

Like this:

Like Loading...