The Spec’d Layer: Stratified Controls Across the Agentic AI Standards Ecosystem — Luminity Digital
Series 11 · The Standards Layer · Post 02 · May 2026

The Spec’d Layer

Post 1 named the convergence at the architectural level. This post walks each spec at the level that makes the architecture operational. AARM specifies what a runtime security system must do. ATF specifies governance progression. MAESTRO specifies threat decomposition. Five Eyes specifies the institutional baseline. NIST CAISI provides the workflow vocabulary. AAGATE demonstrates the composition. Knowing the architecture is not the same as knowing the requirements.

May 2026 Tom M. Gomez Luminity Digital 13 Min Read
Post 1 named the convergence — the phenomenon, between February 2025 and May 2026, of five standards bodies independently publishing frameworks that describe the same stratified architecture for agentic AI security. This post is the depth pass. The goal is not to replace the specs (they live where they live) but to walk each one at the level where it becomes actionable, and to show explicitly where the specs connect. This is a practitioner reference. Return to it when a specific requirement needs to be cited or a specific surface needs to be understood. Posts 3 and 4 read forward from here — Post 3 names what the convergence does not reach, Post 4 reads the trajectory across the rest of 2026. Post 02 of four.

The convergence is real, but reading at the convergence level alone does not help a practitioner make a procurement decision, an architecture decision, or a posture decision. For those, you need to know what AARM R4 actually says about authorization decisions, what ATF promotion from Junior to Senior actually requires, what MAESTRO Layer 5 actually catalogs as threats. The spec layer is where the convergence becomes operational. This post walks the layer.

Standards bodies make this work readable but not always integrated — each spec is published by its own body, in its own format, with its own conformance vocabulary. The practitioner-side question is how the specs compose. Answering that requires the depth pass, and answering it explicitly is the contribution this post is trying to make. The goal here is not to replace the canonical specs (they live where they live, with their full normative text and their own working groups). The goal is to walk each one at the level where it becomes actionable and to show, explicitly, where the surfaces connect.

One organizing note before the depth pass begins. Each section that follows treats one body’s contribution at one architectural surface, in the order: what the spec specifies, what the spec does not specify, and what the adoption signals say. The “does not specify” framing matters as much as the “does specify” framing — knowing where a spec stops is what makes it composable with the others.

AARM: The Runtime Enforcement Spec

The first depth pass. AARM — Autonomous Action Runtime Management is the most concrete of the five frameworks because it specifies behavior at the action layer with named conformance requirements that vendor implementations attest against. The spec originated with Herman Errico at Vanta in February 2026 (arXiv:2602.09433) and was contributed to the CSAI Foundation under stewardship transfer on April 29, 2026.

What AARM Specifies

The Core conformance section of AARM defines six MUST requirements that constitute the runtime enforcement vocabulary:

R1 — Pre-execution interception. Every action the agent attempts must be intercepted before execution. This is the architectural primitive on which everything else rests: the agent cannot reach the external system without crossing a checkpoint. R1 is not about logging after the fact; it is about a structural choke point that makes the rest of the spec enforceable.

R2 — Context accumulation. Prior actions and accumulated state must be available at decision time. The runtime cannot evaluate an action in isolation; it must evaluate the action against the trajectory of prior actions and the policy state that has accumulated. This is what makes AARM-conformant enforcement distinguishable from per-call policy evaluation in traditional API gateways.

R3 — Policy evaluation with intent alignment. Policy decisions must consider the agent’s stated intent, not merely the action signature. Two agents calling the same tool with the same parameters can be evaluated differently if their stated objectives diverge. R3 is the requirement that makes intent a first-class input to authorization, rather than treating intent as an out-of-band concern handled at design time.

R4 — Authorization decisions (including STEP-UP). Every action receives an explicit allow, deny, or step-up decision. The step-up path matters: it makes human-in-the-loop a structural primitive of the runtime, not an out-of-band exception. A deferral to human approval is part of the spec’s decision vocabulary, not a workaround.

R5 — Tamper-evident receipts. Decisions and actions are logged in an integrity-protected form. The receipt is not a log entry; it is an attestable record that downstream auditors and the agent’s own program-level governance can rely on. R5 is what makes the runtime decisions reviewable after the fact.

R6 — Identity binding. Every action is bound to a verified agent identity. The action is not anonymous; it is attributable to a specific agent operating under a specific governance posture. R6 is the requirement on which downstream IAM and ATF-level governance composition depends.

Extended conformance (R7–R9, SHOULD) covers least-privilege enforcement, capability-scoped credentials, and cross-agent coordination — important but not minimum-bar. Vendor implementations are now publishing conformance attestations against the Core and Extended sections, and the spec working group at the CSAI Foundation maintains an attestation registry for tracking which implementations claim conformance against which requirements.

What AARM Does Not Specify

AARM is deliberately narrow on what it leaves out. It does not specify implementation language, deployment topology, or service mesh choice — those are properly the implementer’s concern, and constraining them at the spec level would prematurely close the design space. It does not specify threat-model coverage; that surface belongs to MAESTRO and the OWASP Top 10 for Agentic Applications. It does not specify governance progression of the agent itself across its lifecycle; that is ATF’s surface. And it does not specify empirical efficacy under adversarial conditions. No published red-team data exists yet for AARM-conformant systems at scale. Conformance is a declaration of architectural shape, not a measurement of defensive strength.

Adoption Signals

The CSAI Foundation announcement on April 29, 2026 stated that fifty-plus companies were implementing AARM-conformant runtime systems. Vendor conformance attestations are beginning to publish against the Core and Extended sections. The independent research literature converged on patterns that AARM consolidates — VeriGuard from Google Cloud AI (October 2025), CABP from Salesforce (March 2026), GuardAgent, and Winston SMT — each demonstrate, in controlled research conditions, the runtime patterns that AARM specifies at the production-conformance level. The academic-side operationalization closes the loop: AAGATE demonstrates that AARM-pattern enforcement can run inside a Kubernetes-native NIST AI RMF control plane.

ATF: The Governance Progression Spec

The second depth pass. ATF — the Agentic Trust Framework operates at a different layer than AARM — per-agent across the deployment lifecycle, not per-action at runtime. The spec originated with Josh Woodruff at MassiveScale.AI in February 2026, carries a foreword by John Kindervag (the originator of Zero Trust), is licensed CC BY 4.0, and was contributed to the CSAI Foundation under stewardship transfer on April 29, 2026.

What ATF Specifies

The framework defines four maturity levels with explicit promotion criteria, demotion triggers, and minimum residency periods. The levels are not vague stages — they are an articulated progression with named gates between them.

Intern. The entry tier. Agents at Intern operate under supervised execution, with mandatory human approval required for all consequential actions. Capability portfolio is narrow. Approval scope is broad. The agent acts as an apprentice, with its actions reviewed before consequence.

Junior. Promotion from Intern to Junior requires demonstrated time-in-level, evidence of consistent performance against expected actions, and policy-validation gate satisfaction. At Junior, approval scope narrows and capability portfolio widens. Specific action categories may be granted autonomous execution authority while others remain gated.

Senior. Promotion to Senior is the threshold at which the agent operates autonomously within its scope, with audit and rollback infrastructure as the safety mechanism rather than approval gates. The agent acts on its own; the system retains the ability to review and reverse. This is the level at which most production agents will operate.

Principal. The senior-most tier. Full autonomous execution across the agent’s scope, with continuous attestation and policy-bound capability. Principal does not mean unrestricted; it means the restrictions are policy-encoded rather than gate-enforced. Promotion to Principal requires the strongest evidence-of-performance and evidence-of-no-harm thresholds in the framework.

Each promotion is gated by four criteria: time-in-level, evidence of performance against expected actions, evidence of no significant harm during the residency period, and policy-validation satisfaction. Demotion triggers are also explicit — defined conditions under which an agent moves backward in maturity, including policy violations, unexpected behavior patterns, and changes in deployment context that invalidate the original promotion evidence.

What ATF Does Not Specify

ATF is deliberately not a runtime spec. It does not specify per-action behavior; that surface belongs to AARM. It does not specify the format that evidence-of-performance and evidence-of-no-harm must take; those are left to implementers and sector regulators. It does not specify industry-specific policy overlays — banking, healthcare, and critical infrastructure overlays are properly the work of sector-specific working groups, and the framework’s structure is intended to accept those overlays without modification.

Adoption Signals

Microsoft is named in the spec materials as an early adopter. The framework provides a 1:1 cross-walk to AWS’s Agentic AI Security Scoping Matrix, which makes ATF-tier evidence directly usable in AWS-native governance programs. The Kindervag foreword positions ATF as the Zero Trust extension for non-human identities — a significant intellectual lineage that locates the framework inside the established Zero Trust discipline rather than treating it as a parallel construct. The adoption pattern across early enterprise deployments is mapping existing agent inventories to ATF levels as part of governance program updates.

MAESTRO: The Threat-Modeling Spec

The third depth pass. MAESTRO — Multi-Agent Environment, Security, Threat, Risk, and Outcome predates AARM and ATF in publication timing. It originated as a Cloud Security Alliance blog post by Ken Huang in February 2025, and was subsequently formalized as the OWASP GenAI Multi-Agentic System Threat Modelling Guide v1.0. The dual-body lineage is intentional: Huang holds standing across CSA, OWASP Top 10 for LLMs, and NIST GenAI, and the framework’s cross-publication reflects a deliberate decision to make MAESTRO a shared analytical resource rather than a body-specific artifact.

What MAESTRO Specifies

The framework decomposes agentic AI risk across seven architectural layers, each with its own threat categories:

L1 — Foundation Models. Model-layer threats: prompt injection, jailbreaks, alignment drift, model-level capability misuse. The risks that exist at the layer where the language model itself reasons and generates.

L2 — Data Operations. Training-data poisoning, retrieval-augmented generation attacks, vector store integrity threats, and the broader category of risks that emerge from data the model consumes during operation rather than training.

L3 — Agent Frameworks. Orchestration-layer threats: prompt-template injection, framework-level escalation, attacks against the agent’s internal scaffolding rather than the foundation model itself.

L4 — Deployment Infrastructure. Sandbox escape, container or VM compromise, lateral movement, and the traditional infrastructure threats reframed in the context of agentic workloads. SandboxEscapeBench demonstrates that frontier models exhibit log-linear scaling in escape success against this layer.

L5 — Evaluation and Observability. Gaming the evaluation, telemetry manipulation, audit-log tampering. The threats that target the systems supposed to detect threats — the meta-layer at which observability itself becomes the target.

L6 — Security and Compliance. Policy-layer threats: compliance theater, audit-trail gaps, the failure modes that emerge when security and compliance functions are present as artifacts rather than operative controls.

L7 — Agent Ecosystem. Multi-agent emergent threats: viral propagation across agents, swarm coordination, cross-agent influence, and the threat categories that exist only when multiple agents interact at scale.

The framework’s distinctive contribution beyond the seven-layer decomposition is its treatment of cross-layer cascading dynamics. A compromise at L1 (foundation model) propagates upward through L3 (agent framework) and L7 (ecosystem). A weakness at L2 (data operations) shapes what agents at L3 can be made to do. A telemetry gap at L5 makes a defense at L6 ineffective. The cascading is what single-layer frameworks miss, and what MAESTRO is specifically designed to surface.

What MAESTRO Does Not Specify

MAESTRO is a threat-modeling framework. It does not specify defenses — the next layer of work falls to AARM, ATF, and the implementations that realize them. It does not specify quantitative risk scoring; that surface belongs to OWASP AIVSS and SEI SSVC. It does not specify implementation patterns. The framework’s job ends at “here are the threats, decomposed by layer, with cross-layer dynamics named.”

Adoption Signals

AAGATE — Agentic AI Governance Assurance & Trust Engine cites MAESTRO as the Map function inside its Kubernetes-native control plane. MAESTRO is referenced as threat-modeling vocabulary in OWASP, NIST, and CSA publications across the convergence window. Vendor threat-model documents in 2026 are using MAESTRO’s seven-layer decomposition as the analytical framework against which their defenses are organized — a pattern that did not exist before February 2025.

Five Eyes Joint Guidance: The Institutional Baseline

The fourth depth pass. The Five Eyes joint guidance on agentic AI security, published May 1, 2026, operates at the political and procurement layer. Six national signals intelligence agencies issued coordinated binding-intent guidance: CISA and NSA in the United States, NCSC in the United Kingdom, the Canadian Centre for Cyber Security, the Australian Cyber Security Centre, and the New Zealand National Cyber Security Centre. The guidance is read by Gartner as the new procurement baseline for critical infrastructure deployments.

What the Joint Guidance Specifies

The guidance organizes around two complementary structures. The five-domain risk taxonomy names what categories of risk must be evaluated: Privilege (what the agent can do), Design and Configuration (how the agent is constructed and deployed), Behavioral (how the agent acts under operational conditions), Structural (how the agent’s architecture creates or constrains risk), and Accountability (how the agent’s actions are attributable and reviewable). The four-domain technical baseline specifies what must be in place: Identity and Authentication, Least-Privilege Access, Human Oversight and Approval Gates, and Logging and Behavioral Monitoring.

The Mapping onto AARM Core

The Five Eyes technical baseline maps cleanly onto AARM Core conformance requirements. Identity and Authentication corresponds to AARM R6 (Identity Binding). Least-Privilege Access corresponds to AARM R9 in Extended (Least Privilege Enforcement). Human Oversight and Approval Gates corresponds to AARM R4 (Authorization Decisions, including STEP-UP). Logging and Behavioral Monitoring corresponds to AARM R5 (Tamper-Evident Receipts). This mapping is what makes the convergence operational at the spec level: Five Eyes states the requirement at the political baseline, AARM states the enforcement vocabulary at the runtime layer, and the two surfaces compose.

What Five Eyes Does Not Specify

The joint guidance does not specify implementation. It does not specify the format of conformance attestation. It does not specify sector-specific overlays beyond the critical infrastructure framing. The guidance establishes the floor below which deployments will not pass nation-state-level procurement scrutiny; the work of meeting that floor in a specific environment is below the river’s edge.

How the Specs Compose: NIST CAISI and AAGATE

This is where the depth pass earns its integration. The five specs above each define what must happen at a single surface. The question that remains is how they compose into a working program. Two artifacts answer that question.

NIST AI RMF + CAISI as the Workflow Layer

The NIST AI Risk Management Framework defines four functions every responsible AI program executes: Govern, Map, Measure, Manage. Govern is the organizational and policy-setting function. Map is the threat-and-risk identification function. Measure is the risk-quantification function. Manage is the risk-treatment function. The four functions are the program-level vocabulary against which the spec-level requirements are realized.

The institutional infrastructure for executing this work at scale was established when NIST launched the AI Agent Standards Initiative within its Center for AI Standards and Innovation (CAISI) in February 2026. CAISI’s subsequent partnership with Gray Swan AI and the UK AI Security Institute produced the red-teaming baseline that demonstrates the threats are empirically real: more than 250,000 attack attempts from 400+ participants against 13 frontier models, with at least one successful attack identified against every target. The workflow specifies what the program does over time; the other specs specify what the system does at runtime; CAISI provides the institutional infrastructure that holds the workflow together.

AAGATE as the Composition Demonstration

AAGATE (arXiv:2510.25863) is the academic-side demonstration that the specs compose. Published by Huang et al. in November 2025, the paper operationalizes the integration explicitly: the Map function is filled by MAESTRO’s seven-layer decomposition; the Measure function is filled by a hybrid of OWASP AIVSS and SEI SSVC for risk quantification; the Manage function is filled by the CSA Agentic AI Red Teaming Guide. All four functions run inside a Kubernetes-native control plane aligned to the NIST AI RMF.

AAGATE does not implement AARM directly — the paper operates one layer up at the workflow level. But its policy enforcement points map onto AARM-pattern runtime requirements, and the architecture demonstrates that the workflow and runtime surfaces can be coordinated inside a single control plane. This is what composition looks like at the academic-operationalization layer: not five separate frameworks, but five surfaces of the same control plane.

Specs as isolated artifacts. Read each body’s publication in isolation and the standards layer looks like a collection: each body publishes independently in its own format, with no explicit cross-spec composition vocabulary; the practitioner reads each spec in isolation; integration falls entirely to the implementer; conformance is per-spec, not architectural; and coverage gaps remain invisible without manual cross-mapping. This is the reading of the standards layer that was correct before the convergence took shape.

Specs as composed architecture. The reading the convergence has earned looks different: each spec covers one architectural surface; the surfaces are explicitly named and bounded by charter; cross-spec mapping (Five Eyes → AARM, MAESTRO → AAGATE Map, ATF → AWS Scoping Matrix) is made explicit in the artifacts themselves; AAGATE demonstrates the composition academically; NIST AI RMF provides the workflow scaffolding; and coverage gaps become visible at the architectural level rather than buried beneath body-by-body publication. The depth pass this post performs is what makes the composed reading legible.

Reading the Spec’d Layer as a Composed System

The depth pass closes by returning to the four-surface architecture Post 1 named, with the spec-level depth now in hand. At the threat-modeling surface, MAESTRO catalogs threats across seven layers with cross-layer cascading dynamics; OWASP Top 10 categorizes attacks at the agentic application level; the Five Eyes five-domain taxonomy taxonomizes risk at the institutional layer. Three frameworks, one surface.

At the runtime enforcement surface, AARM Core R1–R6 specifies the conformance vocabulary; AAGATE demonstrates the architecture in academic operationalization; the empirical research literature (VeriGuard, CABP, GuardAgent, Winston SMT) demonstrates the patterns in research conditions; vendor attestations begin to operationalize the spec at production scale.

At the governance progression surface, ATF’s four-level maturity model with promotion criteria and demotion triggers provides the framework; the AWS Agentic AI Security Scoping Matrix provides the 1:1 cross-walk to a major cloud governance program; Microsoft’s adoption signals the direction.

At the risk-management workflow surface, the NIST AI RMF four functions provide the program-level vocabulary; CAISI provides the institutional infrastructure; the CSA Agentic AI Red Teaming Guide provides the Manage function vocabulary; AAGATE demonstrates the integration as a working control plane.

At the institutional baseline, the Five Eyes joint guidance establishes the procurement floor below which deployments do not pass nation-state-level scrutiny; Gartner treats it as the new baseline for critical infrastructure procurement.

The Spec’d Layer, Stated Plainly

The spec’d layer is what makes the convergence operational. The architectural map Post 1 named is composed of specifications, each one defining what must happen at a single surface, each one bounded by its body’s charter, each one designed to compose with the others. Knowing the architecture is not the same as knowing the requirements. Post 3 reads what the spec’d layer does not reach.

If the spec’d layer raises specific questions for your deployment

The depth pass is the same depth a practitioner does when reading the specs against a specific environment. If thinking through what AARM Core or ATF maturity progression means for your architecture is useful, the conversation starts here.

Schedule a Conversation
Series 11  ·  The Standards Layer
Post 01 · Published The Convergence
Post 02 · Now Reading The Spec’d Layer
Post 04 · Published Reading the Trajectory
References & Sources

Share this:

Like this:

Like Loading…