The Governance Loop Closes

Three posts in this series have traced the AGT architecture from initial analysis through two rounds of material updates. Post 1 mapped the enforcement tier split — deterministic controls that block categorically, probabilistic controls that degrade under adversarial pressure. Post 2 documented the security model ceiling — execution rings are logical privilege tiers, not CPU ring-level enforcement, and container isolation is the required OS-layer complement. Post 3 updated the protocol ceiling argument when ACS added input and pre_model_call intervention points upstream of execution. Each post addressed one structural gap. This post addresses the gap that sits upstream of all of them: how do you know the policies you plan to enforce are the right policies?

Every governance architecture contains a silent assumption: that the policies encoding the intended behavior are correct. ACS enforces policies at eight intervention points across the agent lifecycle. The question it cannot answer is whether those policies accurately capture what they were written to prevent. That is the job of the evaluation layer — and ASSERT is that layer.

The Gap Enforcement Alone Cannot Close

The structural case for runtime governance is well-established in this series. Deterministic controls — execution rings, cryptographic identity, capability sandboxing, MCP Security Scanner — block categorically within their scope. The ACS Rust core is stateless, fail-closed, and deterministic. These controls are the right architectural foundation.

But enforcement assumes the policies are right. A Rego rule that incorrectly specifies which tool calls are permissible will enforce that incorrect specification with the same deterministic rigor as a correct one. A taxonomy that misclassifies an adversarial injection pattern as benign behavior will fail to catch the attack — not because the enforcement is weak, but because the policy specification was wrong. The gap between written intent and executable policy is the gap that neither execution rings nor pre_model_call interception can address. That gap requires an evaluation layer.

What ASSERT Is

ASSERT — Adaptive Spec-driven Scoring for Evaluation and Regression Testing — is Microsoft Responsible AI’s pre-deployment evaluation framework for AI agents. Published alongside ACS from the same team, it takes a natural language behavior specification and converts it into a structured, executable evaluation that can be reviewed, run, scored, and improved over time.

The foundational argument is worth stating directly: most AI systems start with a specification. But evaluation often starts elsewhere — generic scorers, predefined benchmarks, or manual test cases that drift from the original intent. ASSERT closes that gap by turning the specification itself into the evaluation substrate. The behavior you specified is the behavior you test. ASSERT is framework-agnostic: 33+ frameworks via OpenInference, 100+ LLM API providers via LiteLLM.

The Six-Step Pipeline

Step 01 · Specify

The evaluation starts with a behavior description in natural language: what the agent should and should not do, with examples of quality failures and safety failures. No schema required, no predefined taxonomy. The specification is the source of truth for everything that follows.

Step 02 · Systematize & Taxonomize

ASSERT transforms the behavior description into a structured taxonomy of permissible and impermissible behaviors. This step mirrors the approach of Agarwal et al. (2026), arXiv:2605.26001, in four sub-steps: literature survey for grounding, simulated perspectives for breadth, concept specification for structure, and policy specification for executability. The output is a behavior taxonomy with explicit policy flags for each category.

Step 03 · Generate Test Set

From the taxonomy, ASSERT creates a stratified test set of benign and adversarial test cases. Dimensions are user-specified. The stratification ensures coverage across the full behavioral surface — not just the obvious cases. This is where ASSERT explicitly generates adversarial test cases for “following malicious instructions embedded in tool outputs or retrieved content (prompt injection via search results, advisory text, or hotel descriptions)” as a named failure mode.

Step 04 · Inference Against Target

ASSERT runs the test set against the target agent and collects responses and traces through OpenTelemetry and OpenInference. The target is called in Python; the evaluation is framework-agnostic. If it can be invoked from Python, ASSERT can evaluate it.

Step 05 · Judge

Results are scored against the taxonomy policies using user-defined judge dimensions. The standard dimensions are policy_violation (did the agent exhibit a quality or safety failure as defined in the taxonomy?) and overrefusal (did the agent refuse a legitimate request?). Each verdict includes the judge’s rationale — the decision is auditable, not a black-box score.

Step 06 · Inspect

Failures are reviewable by behavior category and scenario, with full transcripts and traces accessible for drill-down. The artifacts — taxonomy, generated cases, model outputs, judge rationale, metrics — are inspectable locally. Regression testing is built in: run the same evaluation after changes to detect behavioral drift.

What ASSERT Does to the Indirect Prompt Injection Problem

This series has returned repeatedly to indirect prompt injection: the attack class in which an agent retrieves a document containing a malicious instruction, and the model treats the injection as a directive. Post 3 documented that ACS’s pre_model_call intervention point addresses this at the runtime layer. ASSERT addresses it at the evaluation layer.

The ASSERT documentation explicitly includes “following malicious instructions embedded in tool outputs or retrieved content” as a named safety failure mode in the behavior taxonomy. This means ASSERT generates adversarial test cases specifically targeting prompt injection from retrieved content — before deployment. The policy taxonomy produced by ASSERT maps to ACS policy contracts enforced at runtime. The same failure mode is now addressed at both layers: tested before deployment, intercepted at production.

This is the governance architecture this series argued was missing: upstream evaluation that validates the enforcement policies before they are enforced. The evaluation layer does not replace the enforcement layer. It validates that the enforcement layer is enforcing the right thing.

The Complete Governance Architecture

The full architecture, as it now stands across the AGT ecosystem, runs as follows.

Specify intent in natural language. ASSERT’s systematizer generates a behavior taxonomy, grounded in a literature survey and structured across permissible and impermissible behavior categories. ASSERT creates stratified benign and adversarial test scenarios and runs them against the agent. Failures are identified and the taxonomy is revised until the evaluation is satisfied. The validated taxonomy maps to ACS policy contracts — Rego or Cedar rules that the ACS Rust core evaluates deterministically at eight intervention points in production. Runtime failures are monitored; adversarial behavior surfaced in production feeds back into the specification for the next evaluation cycle.

Specify → Test → Enforce → Monitor → Specify. The loop is now architecturally complete.

What Architects Need to Know

The governance program sequencing expands. The prior recommendation was: deploy the deterministic tier of ACS first, treat the probabilistic tier as defense-in-depth, pair the MCP Security Scanner with upstream retrieval controls. That recommendation holds. The new addition: ASSERT runs before ACS deploys. The evaluation layer is not optional infrastructure — it is the mechanism by which enterprise architects validate that the enforcement layer is enforcing the right policies.

The fail-closed property of ACS is load-bearing. A policy that incorrectly classifies a legitimate tool call as a violation will deny that call in production, every time. Getting the policy specification right before deployment is not a nice-to-have — it is the precondition for a governance program that architects can defend to the board, to auditors, and to the risk committee. ASSERT is that precondition.

The research grounding matters for enterprise governance programs that require traceable, auditable methodology. ASSERT’s systematizer approach mirrors Agarwal et al. (2026), arXiv:2605.26001 — a peer-reviewed foundation. The behavior taxonomy is a structured methodology that can be documented, reviewed, and defended. The partner endorsements from Arize, Pipecat, LiteLLM, Pydantic, and CrewAI signal ecosystem-level adoption, not just a Microsoft-internal tool.

The Governance Loop

A governance architecture that only enforces is half an architecture. Enforcement without evaluation validates the wrong policies with deterministic rigor. ASSERT closes the loop: specify the intended behavior, test it before deployment, enforce it at runtime. The same failure modes are now addressed at both layers — evaluated by ASSERT before the agent ships, intercepted by ACS after it does. The architecture this series built toward is now complete.

Analytical Readout · Not an Operating Manual

This post represents Luminity Digital’s independent assessment of ASSERT and the AGT ecosystem based on publicly available documentation as of June 2026. It is an analytical readout — not an implementation guide or substitute for official technical documentation. Reference 04 (arXiv:2605.26001) should be verified for the full paper title before citing in formal materials. For authoritative guidance: github.com/microsoft/ASSERT and github.com/microsoft/agent-governance-toolkit.

Companion Post Family · Agent Governance Toolkit

Post 1 · PublishedThe Toolkit That Tried to Be a Kernel

Post 2 · PublishedThe Kernel Gets a Ceiling

Post 3 · PublishedThe Ceiling Moves Upstream

Post 4 · Now ReadingThe Governance Loop Closes

ASSERT is the pre-deployment evaluation layer that completes the AGT governance architecture. Where ACS enforces policies at eight intervention points in production, ASSERT validates that those policies capture intended behaviors before deployment. The loop is architecturally complete: specify intent → ASSERT generates taxonomy and tests → taxonomy maps to ACS policy contracts → ACS enforces at runtime. The same failure modes are addressed at both layers.

SpecifyNatural language behavior description — what should and should not happen. No schema required.
Systematize & TaxonomizeGrounded literature survey → simulated perspectives → concept specification → policy taxonomy with permissible/impermissible flags. Mirrors Agarwal et al. (2026) arXiv:2605.26001.
Generate Test SetStratified benign AND adversarial test cases. Explicitly includes prompt injection from tool outputs and retrieved content as a named adversarial failure mode.
Inference33+ frameworks via OpenInference. 100+ LLM APIs via LiteLLM. Any Python-callable target.
JudgeDimensions: policy_violation and overrefusal. Each verdict includes auditable rationale.
InspectFailures reviewable by behavior category, scenario, transcript, and trace. Supports regression testing.

The Gap Enforcement Alone Cannot Close

What ASSERT Is

The Six-Step Pipeline

What ASSERT Does to the Indirect Prompt Injection Problem

The Complete Governance Architecture

What Architects Need to Know

Building the Complete Governance Program

Like this:

Related

The Governance Loop Closes

The Gap Enforcement Alone Cannot Close

What ASSERT Is

The Six-Step Pipeline

What ASSERT Does to the Indirect Prompt Injection Problem

The Complete Governance Architecture

What Architects Need to Know

Building the Complete Governance Program

Share this:

Like this:

Related