Post 1 in this series established what the 2026 research corpus documents about probabilistic defense failure: a 17% baseline defense rate against production attacks, automated attack generation outpacing defensive hardening, and a fundamental trilemma that no probabilistic defense simultaneously resolves. That post — and the entire foundation for this series — draws directly on the five attack surfaces documented in Series 1: the alignment gap at the tool-call layer, the MCP monoculture vulnerability, active supply chain compromise, the Viral Agent Loop, and multi-agent data exposure. This post examines the alternative — deterministic architectural enforcement — and what it actually delivers when implemented correctly. The primary sources are SEAgent (arXiv:2601.11893), Authenticated Workflows (arXiv:2602.10465), Trustworthy Agentic AI Requires Deterministic Architectural Boundaries (arXiv:2602.09947), Policy Compiler for Secure Agentic Systems (arXiv:2602.16708), and Governance Architecture for Autonomous Agent Systems (arXiv:2603.07191).
The numbers reported in this post are not realistic improvement targets. They are empirical results from peer-reviewed research published in Q1 2026. SEAgent achieves a 0% attack success rate against five categories of privilege escalation in LLM-based agent systems. Authenticated Workflows achieves 100% recall with zero false positives across 174 test cases. The Policy Compiler for Secure Agentic Systems raises policy compliance from 48% to 93% with zero violations in instrumented runs. These figures sit so far outside the performance envelope of the probabilistic defenses reviewed in Post 1 that they require a different explanatory frame.
The frame is architectural. What these systems share is not a more sophisticated classifier or a more carefully curated training set. They share a structural commitment: the security property they enforce does not depend on the model’s judgment about whether an input is adversarial. It depends on a policy check that runs independently of the model, against a property that either holds or does not. The model cannot reason its way around the check because the check does not reason — it enforces.
Attack success rate achieved by SEAgent across five categories of privilege escalation attack in LLM-based agent systems — direct prompt injection, indirect prompt injection, RAG poisoning, untrusted agent instructions, and confused deputy attacks — under mandatory access control enforcement. This is not a reduction from a prior baseline. It is elimination. (SEAgent, arXiv:2601.11893)
What Deterministic Enforcement Actually Means
The term is precise and the precision matters. A deterministic defense enforces a security property through a mechanism that produces the same outcome regardless of the adversary’s input. It does not evaluate the input’s content. It checks a binary condition — is this operation permitted under the current policy, given the current principal’s credentials and the current context? — and either permits or denies.
This is how security has worked in every domain where it has been made reliable. A Unix file permission does not evaluate whether the program requesting access seems legitimate. A network access control list does not judge whether a packet’s payload looks benign. A cryptographic signature check does not assess whether the signer’s intentions are honest. Each enforces a structural property through a mechanism that the adversary’s cleverness cannot influence — because the mechanism does not evaluate cleverness.
LLM-based agents have historically lacked this layer. Everything that happens in an agent’s context window is evaluated by the model — including the security-relevant decisions about what actions to take and what instructions to follow. A model that has been trained to refuse harmful instructions will usually refuse them. An adversary with enough computational budget to find the input that bypasses the training will find it. The training provides no structural guarantee because it is a learned tendency, not an enforced rule.
The Trinity Defense Framework
The Trustworthy Agentic AI Requires Deterministic Architectural Boundaries paper (arXiv:2602.09947) provides the most complete theoretical foundation for deterministic enforcement in the corpus. It proposes the Trinity Defense — three architectural layers that together eliminate the classes of vulnerability that probabilistic defenses leave open.
Action Governance interposes a policy enforcement point between the model’s output and the system’s action layer. No tool call executes until the governance layer validates it against a defined policy. The model does not decide whether an action is permitted — the governance layer does, through a check the model cannot influence.
Information Flow Control enforces data classification labels across the agent’s context window, preventing sensitive data from flowing into outputs or tool calls that are not authorized to receive it. The model does not decide whether data is too sensitive to share — the IFC layer does, structurally.
Privilege Separation partitions agent capabilities into least-privilege compartments, ensuring that a compromise of one capability domain cannot propagate to others. The model does not manage its own privilege boundaries — the architecture does.
SEAgent: Five Attack Categories, Zero Successes
The SEAgent paper (arXiv:2601.11893) takes the most systematic approach to privilege escalation in the corpus. It formalizes five distinct attack vectors — direct prompt injection, indirect prompt injection, RAG poisoning, untrusted agent instructions, and confused deputy attacks — under a unified privilege escalation framework, then applies mandatory access control enforcement to all five simultaneously.
The MAC enforcement operates at the action layer: before any tool invocation executes, the system checks whether the current execution context has the privileges required for that invocation under the current policy. The check is structural. It does not ask the model whether the invocation seems appropriate. It verifies that the combination of principal, action, and resource satisfies the policy — and denies any combination that does not.
Against all five attack categories, including indirect prompt injection through retrieved documents and confused deputy attacks where one component attempts to leverage another’s privileges, SEAgent records 0% attack success. Not a low success rate. Zero. The attacks do not fail because the model recognizes them as attacks. They fail because the action they require is structurally impermissible under the policy — regardless of how the instruction prompting that action was framed.
What Zero Percent Actually Means
A 0% attack success rate in an adversarial security evaluation is categorically different from a low attack success rate. A low rate means the defense is hard to bypass. A zero rate means the defense cannot be bypassed through the attack vector in question — because the enforcement mechanism does not evaluate the attack. It evaluates a policy condition that the attack cannot satisfy by construction. The distinction matters for how the defense is reasoned about: a low rate can be improved by a sufficiently sophisticated adversary; a zero rate requires changing the architecture, not the attack.
Authenticated Workflows: 100% Recall, Zero False Positives
The Authenticated Workflows paper (arXiv:2602.10465) addresses the trust layer above individual tool invocations — the question of how an enterprise agentic system establishes and maintains trust across an entire workflow involving multiple agents, tools, and data sources. Its approach is to treat the problem as a cryptographic one rather than a semantic one.
The paper introduces MAPL, an AI-native policy language, and a universal security runtime that integrates nine major agent frameworks — MCP, A2A, OpenAI, Claude, LangChain, CrewAI, AutoGen, LlamaIndex, and Haystack — under a common trust model. Each element of a workflow — prompts, tools, data sources, and context — is treated as a distinct trust boundary. Trust is not inferred from content. It is established through cryptographic authentication and enforced through the runtime’s policy engine.
Across 174 test cases spanning threat categories including prompt injection, tool poisoning, data exfiltration, and cross-agent authority escalation, the system achieves 100% recall with zero false positives. Every attack is detected. No legitimate workflow is incorrectly flagged. The performance is possible because the system is not classifying inputs as malicious or benign — it is checking whether authenticated principals are authorized for requested operations. An unauthorized operation is unauthorized regardless of how it is framed.
Deterministic enforcement does not produce better results because it is smarter than probabilistic defense. It produces better results because it is asking a different question — one whose answer does not depend on the adversary’s ability to craft a convincing input.
— Synthesis from SEAgent (arXiv:2601.11893) and Authenticated Workflows (arXiv:2602.10465)Policy Compliance at Scale: The PCAS Result
The Policy Compiler for Secure Agentic Systems paper (arXiv:2602.16708) addresses a challenge that SEAgent and Authenticated Workflows leave open: how to express and enforce security policies across complex agentic systems where the execution path is not predetermined. Its approach models agentic system state as dependency graphs that capture causal relationships between operations, with policies expressed in a Datalog-derived language and enforced by a deterministic reference monitor.
The reference monitor operates at runtime, checking each operation against the current dependency graph state before permitting execution. Because the check is deterministic — it evaluates a logical formula against a defined state, not a classifier output against an input distribution — it produces consistent enforcement regardless of input framing.
In testing across frontier models including GPT-4o, Claude Opus, and Gemini Ultra, PCAS improves policy compliance from 48% to 93% with zero violations in fully instrumented runs. The 48% baseline is itself instructive — it represents how often production frontier models comply with security policies when compliance is evaluated through their own judgment. The gap between 48% and 93% is the gap between probabilistic and deterministic enforcement.
Better Behavior, Not Guaranteed Behavior
Safety alignment and trained guardrails measurably improve model behavior across the distribution of inputs they were trained on. In production, against the majority of non-adversarial inputs, they perform well. Against adaptive adversaries with automated attack generation, they fail at documented rates above 60% for frontier models.
The 48% policy compliance baseline from PCAS is the honest figure for what frontier models achieve when left to enforce security policies through their own judgment — even with careful prompting and alignment.
48% Unaided ComplianceEnforced Behavior, Input-Independent
SEAgent: 0% attack success against five privilege escalation categories. Authenticated Workflows: 100% recall, zero false positives across 174 test cases. PCAS: 93% policy compliance with zero instrumented violations. Governance Architecture: 93%+ interception across three threat classes.
These results are not produced by better training. They are produced by removing the model’s judgment from the security enforcement path for operations where structural enforcement is possible.
93% Enforced ComplianceWhat Deterministic Enforcement Cannot Do
Intellectual honesty requires naming the limits. Deterministic enforcement is powerful within a defined policy scope, and its limits are also structural — meaning they are knowable and plannable around, not probabilistic and unknowable.
First, policy definition requires human judgment. A mandatory access control policy that incorrectly specifies permitted operations will enforce incorrect permissions. The enforcement is deterministic; the correctness of the policy is not automatically guaranteed. Writing accurate policies for complex agentic systems with dynamic execution paths is hard, and the PCAS paper’s 93% figure reflects that even carefully instrumented deterministic enforcement leaves a 7% gap — in that case, attributable to policy coverage limitations rather than enforcement failures.
Second, deterministic enforcement cannot address what it does not cover. SEAgent’s 0% figure holds against the five attack categories the MAC policy was designed to address. Novel attack categories outside the policy’s scope are not covered. This is why the research community, including the papers surveyed here, consistently frames deterministic enforcement as a complement to probabilistic defenses rather than a replacement. Probabilistic defenses provide broad coverage across unknown attack patterns. Deterministic enforcement provides structural guarantees against the specific threat categories the policy covers.
Third, architectural enforcement adds operational complexity. Designing, deploying, and maintaining a policy enforcement layer that stays current with a rapidly evolving agent capability set is a non-trivial engineering investment. Organizations need to weigh that cost against the security value — which, given the empirical results in this post, is substantial.
The 2026 research corpus converges on a two-layer architecture: probabilistic defenses as a broad first pass across the full input distribution, and deterministic enforcement as a structural backstop for the specific threat categories where policy-based control is achievable. Neither layer is sufficient alone. Together, they produce the security posture the research demonstrates is achievable — and that organizations deploying agentic systems in production need to be building toward now, not treating as a future-state aspiration.
