The Policy Authorship Problem: Who Writes the Rules, and How

This is Post 2 of The Policy Layer, our fifth series on agentic AI security. Post 1 established the foundational problem: linear message history destroys causal provenance, making real authorization logically impossible without a different state representation. The dependency graph provides that representation. This post addresses what comes next — the engineering problem that no state representation solves on its own: someone has to write the policies. This post maps the current research landscape on that problem and gives practitioners a framework for choosing an approach that fits their deployment context.

Post 1 of this series closed with a concrete number: frontier LLM agents operating without explicit policy enforcement comply with organizational authorization requirements 48% of the time across enterprise task domains. That figure, drawn from PCAS (arXiv:2602.16708), is not a measurement of adversarial attack success. It is a measurement of routine operational failure — the rate at which capable, well-aligned models make incorrect authorization decisions during ordinary task execution, simply because they have no structural mechanism for reasoning about whether their actions are sanctioned. More than half of actions taken in a typical unguarded agentic deployment violate at least one of the organization’s authorization policies. The model is not being attacked. It is operating normally. Normal is the problem.

The dependency graph and reference monitor described in Post 1 are the correct architectural answer to this problem. A system that is policy-compliant by construction — instrumented before deployment to enforce declarative authorization rules over a causal state graph — does not rely on the model to make authorization decisions correctly. The reference monitor makes them, deterministically, outside the model’s reasoning loop. PCAS demonstrates this: 93% compliance and zero policy violations in instrumented runs, against a 48% unguarded baseline.

But there is a step between “we need a reference monitor enforcing declarative policies over a dependency graph” and “our system is actually doing that.” That step is policy authorship: the work of translating an organization’s authorization requirements into machine-readable specifications that the enforcement mechanism can evaluate. It is, empirically, where most real deployments will struggle. The research community has begun addressing it. No one has solved it. This post maps what has been built and what the gaps remain.

The Policy Authorship Gap

Policy authorship is the distance between knowing what you want to enforce and having a specification that enforces it. In conventional software security, this gap is bridged by access control lists, role-based permissions, network firewall rules, and database privilege grants — mechanisms with decades of tooling, operational experience, and practitioner familiarity. A security engineer asked to specify who can read what data has established frameworks, known patterns, and organizational processes for doing so.

In agentic systems, the problem is structurally more demanding. Authorization requirements in agentic deployments are not primarily about identity and resource access — they are about causal information flow across dynamic, multi-agent execution traces. The relevant question is not “is this agent permitted to read this file?” It is “is the action this agent is about to take causally traceable, through any number of intermediate agents and tool invocations, to an information source it was authorized to act on?” These are different questions. The second requires a policy language capable of recursive reasoning over causal graphs, authored by people who understand both the threat model and the formal semantics of information flow. That combination does not exist at scale in most organizations today.

The Policy Authorship Gap

The policy authorship gap is the distance between knowing what security property an organization wants to enforce and having a machine-readable specification that enforces it correctly, completely, and without conflict across the full range of states an agentic system can reach.

The gap has three components. The expressiveness problem: the policy language must be capable of encoding the required security properties, including transitive information flow constraints that require recursive query support. The completeness problem: a partial policy — one that covers the anticipated threat scenarios but not unanticipated edge cases — will enforce correctly in testing and fail in production at the boundary of its coverage. The sustainability problem: policies must be maintained as agent capabilities, tool integrations, and organizational requirements change, by teams that may not have the formal methods background that the initial specification required.

Four Approaches to the Same Problem

The PCAS research neighborhood has produced four distinct approaches to the policy authorship problem. Each answers a different version of the question. Each has a different profile of expressiveness, authorship burden, formal verifiability, and operational fit. None is universally optimal. The practitioner’s task is matching approach to deployment context.

Approach 1: Declarative Specification in Datalog

PCAS uses a policy language derived from Datalog — a subset of first-order logic optimized for recursive queries over relational data. The choice is deliberate. Datalog’s recursive query support enables policies that express transitive closure over the dependency graph: “no action may be taken whose causal history includes a path, through any number of hops, from an untrusted external source.” This is the class of policy that linear message history cannot evaluate and that most existing policy languages cannot express. It is also the class of policy that the most dangerous multi-agent attack vectors require.

The expressiveness comes at a cost. Writing Datalog-derived policies requires understanding the formal semantics of the dependency graph, the threat model against which the policies are designed, and the operational context of the deployment. This is policy engineering work. It is not accessible to security teams without formal methods background, and it does not scale through the kind of trial-and-error iteration that most operational security work depends on. PCAS’s evaluation demonstrates that the resulting policies work — zero violations in instrumented runs across three case study domains — but does not address whether organizations can author those policies at the pace their deployments require.

Approach 2: Domain-Specific Language Rule Sets

AgentSpec (arXiv:2503.18666, Wang et al., March 2025, accepted ICSE 2026) takes a more accessible approach: a lightweight domain-specific language where policies take the form of structured rules combining a trigger, a predicate, and an enforcement action. A rule might read: “when the agent attempts a file write, check that the target path is within the authorized workspace; if not, block and log.” The rules are interpretable, modular, and writable by security engineers without formal methods training.

90%+

Unsafe execution prevention rate for AgentSpec across code agent tasks, with 100% compliance enforcement in autonomous vehicle scenarios — achieved at millisecond-level computational overhead. For policies expressible as trigger-predicate-enforcement rules, AgentSpec demonstrates that deterministic runtime enforcement is computationally cheap and immediately practical.

AgentSpec addresses part of the authorship problem by making rules writable by humans. It addresses another part through a finding that is both encouraging and cautionary: LLM-generated AgentSpec rules, produced by OpenAI o1 from risk descriptions, achieve 95.56% precision and 70.96% recall for embodied agent tasks, and successfully identify 87.26% of risky code scenarios. Rule generation can be partially automated. But 70.96% recall means that roughly 29% of risky behaviors are not caught by the LLM-generated rules — a gap that is acceptable for some deployments and unacceptable for others. It also means that automation shifts the authorship burden rather than eliminating it: a human must still evaluate the generated rules for completeness, audit their coverage, and decide whether the recall gap is within tolerance for the specific deployment.

AgentSpec’s acknowledged limitation is structural: it operates reactively on individual tool invocations without causal history. It can enforce “this agent may not send to external addresses” but cannot enforce “no action may be causally influenced by untrusted content retrieved in a previous session.” The reactive, non-transitive architecture that makes AgentSpec lightweight and accessible is the same architecture that makes it insufficient for the most demanding cross-agent provenance policies.

Approach 3: STPA Hazard Analysis Feeding Information Flow Constraints

The most rigorous approach in the neighborhood comes from safety engineering rather than security research. The STPA+IFC paper (arXiv:2601.08012, accepted ICSE 2026) proposes applying System-Theoretic Process Analysis — a hazard analysis methodology developed for safety-critical systems — to derive information flow constraints for agentic deployments. The workflow begins with a structured hazard analysis: identify what unsafe outcomes the system could produce, trace the control actions that could lead to those outcomes, derive safety requirements from the analysis, and formalize those requirements as enforceable IFC constraints.

The resulting policies inherit the rigor of the safety engineering process. Rather than asking a security engineer to specify policies from first principles — which requires knowing in advance what attack scenarios to defend against — STPA derives policy requirements from a systematic analysis of what the system could do wrong. The policies are grounded in explicit hazard models. Their coverage is limited by the completeness of the hazard analysis, which is itself a tractable engineering task with established methodology.

The authorship burden is high but differently distributed. STPA requires safety engineers or engineers trained in the methodology. The resulting constraints are more likely to be complete because they derive from a structured analysis rather than from an individual’s intuition about threat scenarios. But the process is time-intensive, requires domain expertise, and is better suited to high-stakes regulated deployments — pharmacovigilance, financial services, critical infrastructure — than to the iterative development cycles of typical enterprise agentic applications.

What distinguishes this approach editorially from the others is its target layer. Where PCAS and AgentSpec enforce policies at the agent-framework application layer, the STPA+IFC paper proposes enforcement at the protocol layer — specifically, extending MCP itself to carry structured labels on capabilities, confidentiality levels, and trust tiers. This is the only approach in the neighborhood that directly addresses the structural MCP gap that the Luminity corpus has documented. Its protocol-layer ambition and its engineering rigor make it the most consequential approach for organizations operating in regulated sectors, and the one with the longest path from research to production deployment.

Approach 4: Behavioral Learning — Sidestepping Authorship Entirely

The fourth approach treats the authorship problem as the wrong problem to solve. If writing correct, complete, maintainable policy specifications is intractably difficult at scale, why not learn safe behavioral bounds from observed execution rather than specifying them? Agent-Sentry (arXiv:2603.22868, Sequeira et al., March 2026) constructs functionality graphs from prior benign and adversarial executions, capturing recurring execution flows comprising control flow structures and data provenance. At runtime, it compares proposed actions and their preceding execution context against these learned graphs to detect behaviors outside the envelope of previously observed legitimate use.

Pro2Guard (arXiv:2508.00500, Wang et al., August 2025) takes a different learning approach: it abstracts agent behaviors into symbolic states and learns a Discrete-Time Markov Chain from execution traces, enabling proactive probabilistic prediction of future safety violations before they occur. This reduces unnecessary LLM queries by 12.05% through early rejection of unsafe trajectories — a meaningful operational efficiency alongside the safety benefit.

Human-Authored Declarative Policies

Specified Before Deployment

Policy requirements are translated into formal specifications — Datalog, DSL rules, or IFC constraints derived from hazard analysis — before the system is deployed. The reference monitor evaluates actions against these specifications deterministically.

Correct policies produce provably strong guarantees. The guarantee is bounded by the completeness of the specification: policies that don’t cover an attack scenario don’t block it. Authorship burden is high. Maintenance burden grows with deployment complexity. Best suited to deployments where threat scenarios are well-understood and requirements are stable.

Formally Verifiable · Authorship-Intensive

Learned Behavioral Bounds

Inferred From Observed Execution

Safe behavioral bounds are inferred from prior benign and adversarial execution traces rather than authored by human policy engineers. The enforcement mechanism detects deviations from learned patterns or predicts unsafe trajectory probabilities.

Eliminates the formal specification authorship burden. Bounds degrade under distribution shift — behaviors outside the training distribution are not covered. Provides no formal verifiability guarantees. Best suited to deployments where the threat landscape is dynamic and complete advance specification is infeasible. Complements but does not replace formal specification for high-stakes authorization requirements.

No Authorship Burden · Distribution-Dependent

The behavioral learning approaches are genuinely valuable and address a real operational problem. But they carry a limitation that practitioners must understand precisely: learned bounds degrade under distribution shift. Behaviors outside the training distribution — including novel attack vectors that the system was not exposed to during the learning phase — are not covered. An adversary who understands that the enforcement mechanism is learned, not specified, can probe for gaps at the boundary of the training distribution. This is a different failure mode than formal specification incompleteness, and in some ways a more dangerous one: specification gaps are discoverable through coverage analysis, while distribution boundary gaps are not.

A Practitioner Decision Framework

The four approaches are not competitors. They are answers to different versions of the policy authorship question, suited to different deployment contexts. The following framework is intended as a starting point for organizations thinking through which approach or combination of approaches their deployment requires.

Datalog / PCAS

Regulated environments, cross-agent provenance requirements

Use when: authorization requirements include transitive information flow constraints across agent boundaries. When: the threat model includes multi-hop indirect prompt injection or chain laundering. When: formal verifiability of the enforcement guarantee is required for compliance or audit purposes.

Authorship requirement: formal methods or policy engineering expertise. Not suitable for rapid iteration cycles without dedicated policy engineering support.

Zero violations in instrumented runs · PCAS

AgentSpec / DSL Rules

General enterprise deployments, iterative development

Use when: authorization requirements are expressible as trigger-predicate-enforcement rules without transitive provenance requirements. When: operational cadence requires rapid policy iteration. When: LLM-assisted rule generation is acceptable with human review of recall gaps.

Authorship requirement: security engineering familiarity. Does not require formal methods background. Scale through LLM-generated rules with human audit.

90%+ unsafe execution prevention · AgentSpec

STPA + IFC + MCP

High-stakes, regulated, protocol-layer ambition

Use when: deployment is in a regulated sector where hazard analysis methodology is already part of the engineering process. When: the organization is positioned to influence or adopt MCP protocol extensions that carry structured trust labels. When: completeness of policy coverage is more important than speed of authorship.

Authorship requirement: safety engineering expertise plus security engineering. Highest rigor, longest authorship timeline.

Protocol-layer enforcement · arXiv:2601.08012

Agent-Sentry / Pro2Guard

Dynamic threat landscapes, complement to formal specification

Use when: the threat landscape is sufficiently dynamic that advance specification of all relevant policies is infeasible. When: formal specification covers the known high-priority threats and behavioral bounding supplements coverage at the distribution boundary. When: operational efficiency (token reduction through early trajectory rejection) is a deployment concern.

Never as a replacement for formal specification in authorization-critical contexts. Distribution shift degrades coverage in ways that coverage analysis cannot fully detect in advance.

12.05% token reduction · Pro2Guard

The Honest Accounting

No approach in the current research landscape fully solves the policy authorship problem at the scale and velocity that enterprise agentic deployment requires. Datalog is powerful but demands expertise that most organizations do not have. AgentSpec is practical but leaves a 29% recall gap in automated rule generation and cannot enforce transitive provenance policies. STPA is rigorous but slow. Behavioral learning sidesteps authorship but degrades under distribution shift in ways that are difficult to bound in advance.

The state of the art is: organizations deploying agentic systems in authorization-critical contexts should layer formal specification for their highest-priority policies — specifically any policy requiring transitive information flow reasoning across agent boundaries — with DSL-rule enforcement for the broader set of operational constraints, and behavioral bounding as a supplementary detection layer for novel deviations. This is not a complete solution. It is the best currently available combination, and it requires security engineering investment that most agentic AI deployments are not currently making.

The 48% unguarded compliance baseline is not a measurement of what bad actors can achieve against frontier models. It is a measurement of what frontier models do on their own, without any adversarial pressure, in ordinary enterprise workflows. The policy authorship problem is the gap between that number and the compliance rate an organization’s risk tolerance requires. Closing it is an engineering task. It is not a model capability task, a governance task, or a prompt engineering task. It requires policy specification infrastructure that most organizations have not yet built.

— Synthesis from PCAS (arXiv:2602.16708); AgentSpec (arXiv:2503.18666); arXiv:2601.08012

The MCP Protocol-Layer Implication

A thread that runs through the policy authorship landscape deserves explicit attention before this post closes. Three of the four approaches — Datalog, AgentSpec, and behavioral learning — enforce policies at the agent-framework application layer. The STPA+IFC approach is the exception: it proposes enforcement at the protocol layer, specifically by extending MCP to carry structured labels encoding capability scope, confidentiality level, and trust tier directly in the transport.

This matters for a reason that connects directly to the core Luminity thesis. Application-layer enforcement, however well-specified and rigorously authored, operates on top of a protocol that does not itself carry trust metadata. The reference monitor can block an action based on the dependency graph’s provenance record — but only after that action has been proposed by an agent that received its instructions through a protocol that carries no structural distinction between trusted instructions and untrusted content. The policy layer sits above a protocol gap it cannot close from above.

The STPA+IFC paper’s protocol-layer ambition is the research community’s first serious attempt to address that gap from the correct level. Post 3 of this series addresses the enforcement lifecycle axis in full — mapping compile-time, runtime, and protocol-layer enforcement against each other and arguing for the minimum viable stack that covers the gaps none of them closes alone.

The Recall Gap Is Not a Rounding Error

AgentSpec’s LLM-generated rules achieve 70.96% recall. In a security context, 29.04% of risky behaviors going undetected is not a minor shortcoming — it is the attack surface that an adversary optimizing against a known enforcement mechanism will target first. Organizations using automated rule generation must audit generated rules for recall gaps against their specific threat model, not against the benchmark scenarios used in the AgentSpec evaluation. The benchmark recall figure is a starting point for understanding the approach’s capability. It is not a deployment-ready compliance guarantee.

The Central Insight of This Post

The policy authorship problem is the unsolved engineering challenge that sits between the correct architectural answer — a dependency graph and reference monitor — and a system that actually enforces organizational authorization requirements. Four approaches exist, each suited to a different deployment context, none individually sufficient at enterprise scale. The combination of Datalog for transitive provenance-critical policies, DSL rules for operational constraints, STPA for regulated high-stakes deployments, and behavioral bounding as a supplementary layer represents the current state of the art. It is not the final word. It is what responsible deployment looks like while the field continues to develop the tooling, automation, and operational frameworks that will eventually close the gap between architectural correctness and engineering practicality.

The Policy Layer: From Architectural Enforcement to Operational Reality · Four-Part Series

Post 1 · Published The Agentic State Problem: Why Linear History Can’t Support Real Authorization

Post 2 · Now Reading The Policy Authorship Problem: Who Writes the Rules, and How

Post 3 · Published Compile-Time, Runtime, or Protocol: Where Enforcement Lives Determines What It Can Guarantee

Post 4 · Published What the Policy Layer Cannot Protect: The Gaps That Remain

Companion Post The Toolkit That Tried to Be a Kernel

The 48% unguarded policy compliance baseline is not an adversarial attack measurement. It is the cost of treating authorization as a model reasoning task in ordinary operation. Four approaches to policy specification exist — Datalog, DSL rules, STPA-derived IFC constraints, and behavioral learning — each suited to a different deployment context. None is universally sufficient. The minimum viable enforcement stack layers formal specification for transitive provenance-critical policies with DSL-rule enforcement for operational constraints and behavioral bounding as a supplementary detection layer.

Series 1 Where Agentic AI Breaks 5 posts · The failure mode map
Series 2 Building Defensible Agents 3 posts · Deterministic architecture
Series 3 The Invisible Attack 3 posts · Indirect prompt injection
Series 4 Fault Lines 3 posts · Attack surface compounding

01 arXiv:2602.16708 — PCAS: Policy Compiler for Secure Agentic Systems
02 arXiv:2503.18666 — AgentSpec: Customizable Runtime Enforcement
03 arXiv:2601.08012 — Towards Verifiably Safe Tool Use (STPA + IFC + MCP)
04 arXiv:2603.22868 — Agent-Sentry: Bounding LLM Agents via Execution Provenance
05 arXiv:2508.00500 — Pro2Guard: Proactive Runtime Enforcement via Probabilistic Model Checking
06 arXiv:2512.01295 — Systems Security Foundations for Agentic Computing

→ Policy authorship gap — The distance between knowing what security property an organization wants to enforce and having a machine-readable specification that enforces it correctly, completely, and without conflict across all reachable system states.
→ Datalog — A logic-programming language subset enabling recursive queries over relational data; used in PCAS to express transitive information flow policies that standard access control languages cannot represent.
→ STPA (System-Theoretic Process Analysis) — A safety engineering methodology that identifies hazards by analyzing unsafe control actions in a system model, applied here to derive information flow constraints for agentic deployments from first principles.
→ Distribution shift risk — The vulnerability of learned behavioral bounds to degradation when an agentic system encounters behaviors outside the distribution of execution traces it was trained on; the primary limitation of behavioral learning approaches for security-critical authorization.

The Policy Authorship Problem: Who Writes the Rules, and How

The Policy Authorship Gap

The Policy Authorship Gap

Four Approaches to the Same Problem

Approach 1: Declarative Specification in Datalog

Approach 2: Domain-Specific Language Rule Sets

Approach 3: STPA Hazard Analysis Feeding Information Flow Constraints

Approach 4: Behavioral Learning — Sidestepping Authorship Entirely

Specified Before Deployment

Inferred From Observed Execution

A Practitioner Decision Framework

Regulated environments, cross-agent provenance requirements

General enterprise deployments, iterative development

High-stakes, regulated, protocol-layer ambition

Dynamic threat landscapes, complement to formal specification

The Honest Accounting

The MCP Protocol-Layer Implication

The Recall Gap Is Not a Rounding Error

Up Next: Post 3 — Compile-Time, Runtime, or Protocol

Like this:

Related

The Policy Authorship Problem: Who Writes the Rules, and How

The Policy Authorship Gap

The Policy Authorship Gap

Four Approaches to the Same Problem

Approach 1: Declarative Specification in Datalog

Approach 2: Domain-Specific Language Rule Sets

Approach 3: STPA Hazard Analysis Feeding Information Flow Constraints

Approach 4: Behavioral Learning — Sidestepping Authorship Entirely

Specified Before Deployment

Inferred From Observed Execution

A Practitioner Decision Framework

Regulated environments, cross-agent provenance requirements

General enterprise deployments, iterative development

High-stakes, regulated, protocol-layer ambition

Dynamic threat landscapes, complement to formal specification

The Honest Accounting

The MCP Protocol-Layer Implication

The Recall Gap Is Not a Rounding Error

Up Next: Post 3 — Compile-Time, Runtime, or Protocol

Share this:

Like this:

Related