The four-surface architecture the convergence landed on
Between February 2025 and May 2026, five standards bodies independently published frameworks that converge on the same architecture. Each spec answers one architectural question. Each is bounded by its body’s charter. The surfaces compose; they do not compete. Open any tab below to see the spec’s structure laid out.
Open any tab above to see the spec’s structure laid out spatially. Tabs 05 (Composition), 06 (River’s Edge), and 07 (Trajectory) close the readout by showing how the surfaces wire together, where the spec layer ends, and where the institutional architecture is heading.
The runtime enforcement spec, requirement by requirement
Originated by Herman Errico at Vanta (Feb 2026, arXiv:2602.09433). Stewardship transferred to the CSAI Foundation on April 29, 2026. Core (R1–R6) is the MUST-bar; Extended (R7–R9) is the SHOULD-bar. Vendor conformance attestations attest against these requirements.
Pre-execution interception
Every action the agent attempts is intercepted before execution. The structural choke point on which everything else rests — the agent cannot reach the external system without crossing it.
Context accumulation
Prior actions and accumulated state are available at decision time. The runtime evaluates against trajectory and accumulated policy state, not in isolation.
Policy evaluation with intent alignment
Decisions consider the agent’s stated intent, not just the action signature. Intent becomes a first-class authorization input.
Authorization decisions (incl. STEP-UP)
Every action receives an explicit allow, deny, or step-up. Human-in-the-loop is a structural primitive of the runtime, not an out-of-band exception.
Tamper-evident receipts
Decisions and actions are logged in integrity-protected form — an attestable record downstream auditors and ATF-level governance can rely on.
Identity binding
Every action is bound to a verified agent identity. Actions are not anonymous; they are attributable to a specific agent under a specific governance posture.
Capability-scoped credentials
Credentials issued to the agent are scoped to the capabilities the agent needs — narrower than a human-equivalent identity would carry.
Cross-agent coordination
Coordination patterns between agents are spec’d: how agent A’s receipts inform agent B’s authorization, how trust propagates across the ecosystem.
Least-privilege enforcement
The structural minimum: the agent has access only to what it needs. Maps onto the Five Eyes Least-Privilege Access baseline domain.
What AARM deliberately does not specify
Implementation language, deployment topology, or service mesh choice — those are the implementer’s concern. Threat-model coverage — that surface belongs to MAESTRO. Governance progression of the agent itself — that is ATF’s surface. Empirical efficacy under adversarial conditions — no published red-team data yet for AARM-conformant systems at scale. Conformance is a declaration of architectural shape, not a measurement of defensive strength.
The governance progression spec, level by level
Originated by Josh Woodruff at MassiveScale.AI (Feb 2026), with foreword by John Kindervag (originator of Zero Trust). Licensed CC BY 4.0. Stewardship transferred to the CSAI Foundation on April 29, 2026. Operates per-agent across the deployment lifecycle — not per-action at runtime. AARM operates per action; ATF operates per agent. The surfaces compose.
Intern
- Mandatory human approval for all consequential actions
- Narrow capability portfolio
- Broad approval scope
- Actions reviewed before consequence
Junior
- Approval scope narrows; capability widens
- Some action categories operate autonomously
- Others remain gated
- Time-in-level evidence required
Senior
- Operates autonomously within scope
- Audit and rollback as safety mechanism
- Most production agents will sit here
- Demotion triggers in force throughout
Principal
- Full autonomous execution across scope
- Continuous attestation, policy-bound capability
- Restrictions are policy-encoded
- Strongest evidence required
Each promotion is gated by four criteria
Demotion triggers · Explicit conditions under which an agent moves backward — policy violations, unexpected behavior patterns, changes in deployment context that invalidate the original promotion evidence.
What ATF deliberately does not specify
ATF is not a runtime spec. It does not specify per-action behavior — that surface belongs to AARM. It does not specify the format that evidence-of-performance and evidence-of-no-harm must take — those are left to implementers and sector regulators. It does not specify industry-specific policy overlays — banking, healthcare, and critical infrastructure overlays are the work of sector-specific working groups.
The threat-modeling spec, seven layers with cascade
Originated by Ken Huang as a CSA blog (Feb 6, 2025), formalized as the OWASP GenAI Multi-Agentic System Threat Modelling Guide v1.0. Dual-body lineage. Decomposes agentic risk across seven architectural layers — and names the cross-layer cascading dynamics single-layer frameworks miss.
Agent Ecosystem
Multi-agent emergent threats: viral propagation, swarm coordination, cross-agent influence. Threats that exist only when multiple agents interact at scale.
Security and Compliance
Policy-layer threats. Compliance theater, audit-trail gaps — failure modes that emerge when security and compliance are present as artifacts rather than operative controls.
Evaluation and Observability
Gaming the evaluation, telemetry manipulation, audit-log tampering. Threats that target the systems supposed to detect threats — the meta-layer becomes the target.
Deployment Infrastructure
Sandbox escape, container or VM compromise, lateral movement — traditional infrastructure threats reframed for agentic workloads. Log-linear scaling in escape success demonstrated empirically.
Agent Frameworks
Orchestration-layer threats: prompt-template injection, framework-level escalation. Attacks against the agent’s internal scaffolding rather than the foundation model.
Data Operations
Training-data poisoning, retrieval-augmented generation attacks, vector store integrity. Risks from data the model consumes during operation, not during training.
Foundation Models
Model-layer threats: prompt injection, jailbreaks, alignment drift, model-level capability misuse. The risks at the layer where the LLM itself reasons and generates.
The cross-layer cascading dynamic
MAESTRO’s distinctive contribution beyond the seven-layer decomposition. A compromise at L1 (foundation model) propagates upward through L3 (agent framework) and L7 (ecosystem). A weakness at L2 (data operations) shapes what agents at L3 can be made to do. A telemetry gap at L5 makes a defense at L6 ineffective. The cascading is what single-layer frameworks miss — and what MAESTRO is specifically designed to surface.
What MAESTRO deliberately does not specify
MAESTRO is a threat-modeling framework. It does not specify defenses — that work falls to AARM, ATF, and the implementations that realize them. It does not specify quantitative risk scoring — that surface belongs to OWASP AIVSS and SEI SSVC. It does not specify implementation patterns. The framework’s job ends at “here are the threats, decomposed by layer, with cross-layer dynamics named.”
The institutional baseline, two taxonomies and one cross-walk
Careful Adoption of Agentic AI Services, May 1, 2026. Coordinated guidance from six agencies: CISA, NSA (US); NCSC (UK); CCCS (Canada); ASD ACSC (Australia); NCSC-NZ (New Zealand). Gartner reads it as the new procurement baseline for critical infrastructure.
Five-Domain Risk Taxonomy
- 01 · PrivilegeWhat the agent can do
- 02 · Design & ConfigurationHow the agent is constructed and deployed
- 03 · BehavioralHow the agent acts under operational conditions
- 04 · StructuralHow the agent’s architecture creates or constrains risk
- 05 · AccountabilityHow the agent’s actions are attributable and reviewable
Four-Domain Technical Baseline
- 01 · Identity & AuthenticationVerified, attributable agent identities
- 02 · Least-Privilege AccessNarrowest credential scope sufficient for purpose
- 03 · Human Oversight & Approval GatesStep-up paths for high-consequence actions
- 04 · Logging & Behavioral MonitoringTamper-evident, attestable record
The 1:1 mapping onto AARM Core
Five Eyes states the requirement at the institutional baseline; AARM states the enforcement vocabulary at the runtime layer. The two surfaces compose.
What the joint guidance deliberately does not specify
The guidance does not specify implementation. It does not specify the format of conformance attestation. It does not specify sector-specific overlays beyond the critical infrastructure framing. The guidance establishes the floor below which deployments will not pass nation-state-level procurement scrutiny; the work of meeting that floor in a specific environment falls below the river’s edge.
NIST AI RMF as workflow, AAGATE as the working demonstration
Five specs define what must happen at five surfaces. The question that remains is how they compose into a working program. Two artifacts answer — the NIST AI Risk Management Framework provides the program-level workflow vocabulary; AAGATE (arXiv:2510.25863) demonstrates the composition as a Kubernetes-native control plane.
NIST AI RMF · the four functions
The program-level vocabulary every responsible AI program executes. AAGATE shows which spec fills each function.
Govern
Filled byOrganizational and policy-setting function. ATF’s per-agent governance progression operates here — the maturity levels and demotion triggers are how Govern is realized at the agent level.
ATF maturityMap
Filled byThreat-and-risk identification. MAESTRO’s seven-layer decomposition with cross-layer cascading dynamics fills this function in AAGATE’s reference architecture.
MAESTRO 7-layerMeasure
Filled byRisk quantification. AAGATE composes OWASP AIVSS with SEI SSVC for the scoring vocabulary. AIVSS scores agentic-specific risk; SSVC delivers an action band.
AIVSS + SSVCManage
Filled byRisk treatment. The CSA Agentic AI Red Teaming Guide provides the activity vocabulary AAGATE binds into the Manage function — selection of threats, scoring, response.
CSA Red Team GuideAAGATE — Agentic AI Governance Assurance & Trust Engine
Huang et al., November 2025. Operationalizes the integration inside a Kubernetes-native control plane aligned to the NIST AI RMF. Map filled by MAESTRO; Measure by AIVSS + SSVC; Manage by the CSA Red Teaming Guide. Govern threads through via ATF maturity levels.
AAGATE does not implement AARM directly — the paper operates one layer up at the workflow level. But its policy enforcement points map onto AARM-pattern runtime requirements, and the architecture demonstrates that workflow and runtime surfaces can be coordinated inside a single control plane. This is what composition looks like at the academic-operationalization layer: not five separate frameworks, but five surfaces of the same control plane.
The flow at runtime
What composes with what, and where each spec lands in the program’s operating loop.
How spec-level composition becomes program-level practice
Each node names one place where a body’s contribution feeds another. The composition is not a single direction — it is a control plane.
Where the spec layer ends and the practitioner layer begins
The spec layer raises the floor. The ceiling — the working production agent at 2 AM, integrated with a particular stack, under particular regulators — is not the spec layer’s surface. It is the practitioner’s. P3 reads that surface: five categories of practitioner-side work, three disciplines, two composing charters. None of it is a gap in standards-body coverage. All of it is structural.
The five categories of practitioner-side work
Where the spec layer terminates and practitioner judgment, environment-specific knowledge, and operational reality begin.
Environment-Specific Integration
The spec layer defines what runtime enforcement must do. It does not specify how to wire R1 into a particular service mesh, map R6 onto a particular workload identity provider, or translate R9 into a particular policy-as-code configuration. Integration is the architect’s.
Empirical Efficacy Measurement
Architectural conformance and measured defensive strength are two different surfaces. The spec layer addresses the first. Red-teaming the actual deployment against the threats most relevant to its specific environment is the practitioner’s program to build and run.
Sector-Specific Overlay Construction
The spec layer is sector-neutral by design. Sector overlays — HIPAA audit-trail mapping, FFIEC examination alignment, SEC cybersecurity disclosure coordination — are the work of sector regulators and industry consortia operating with the spec layer rather than inside it.
Cross-Spec Composition in Production
The bodies specify what composes; the operational details of composition in a specific environment are the practitioner’s. How AARM R5 receipts flow into ATF promotion-review systems, how attestations get coordinated across federated identity domains — this is environment-specific work.
Multi-Agent Ecosystem Dynamics
MAESTRO Layer 7 names the layer. Spec-level guidance on ecosystem dynamics is not yet mature; the research literature is still consolidating. Until that work matures into spec-level guidance, the layer’s operational treatment falls to practitioners.
Three practitioner-side disciplines
The five categories describe where the work happens. The three disciplines describe what the work is.
Architecture
Designing the deployment to satisfy AARM Core conformance, ATF maturity progression, and the Five Eyes technical baseline simultaneously in a specific environment. Sidecar vs. inline gateway; co-located vs. remote PDPs; centralized vs. federated identity.
Operations
Running the deployment — monitoring it, responding to incidents, maintaining conformance evidence, updating for adversary evolution, feeding operational reality back into governance evaluation. The CSA Red Teaming Guide provides the Manage-function vocabulary; the operational program is the practitioner’s.
Governance
Aligning ATF maturity progression with the organization’s risk tolerance, audit expectations, regulatory exposure, and governance bodies the organization answers to. ATF specifies the criteria categorically; what constitutes adequate evidence in a regulated environment is the practitioner’s translation.
The two charters compose
The standards-body charter and the practitioner charter cover different surfaces of the same working ecosystem. Vendors sit between. Luminity reads the composition.
STANDARDS-BODY CHARTER
In scope, by charter
- Publishing requirements that hold across the design space
- Defining conformance vocabulary at each architectural surface
- Maintaining stewardship of open specifications
- Establishing institutional infrastructure for ongoing development
- Coordinating across bodies on composition vocabulary
- Hosting attestation registries
PRACTITIONER CHARTER
In scope, by charter
- Designing the deployment to satisfy spec conformance in the specific environment
- Implementing environment-specific integration
- Running empirical efficacy measurement against the actual threat surface
- Coordinating with sector regulators on sector overlays
- Operating cross-spec composition in production
- Monitoring multi-agent ecosystem dynamics until spec-level guidance matures
Four institutional structures, and the gravity each one creates
The convergence established institutional architecture for ongoing standards development that did not exist eighteen months ago. Four structures now operate where none operated before. Each one is trajectory infrastructure. Each one creates conditions for specific patterns of further maturation across 2026. This is reading, not predicting.
CSAI Foundation
Hosts AARM and ATF under formal stewardship. Maintains the conformance attestation registry. Coordinates working groups across the CSA infrastructure. Provides the institutional venue where vendor implementations and practitioner adopters interact with the specifications as they evolve.
- Additional spec stewardship as independent open specifications mature
- Cross-spec composition vocabulary — formal binding between AARM receipts and ATF promotion evidence
Not a prediction of which specifications migrate or on what timeline. Not a forecast that the Foundation dominates institutional space — OWASP, CSA working groups, and NIST workstreams continue to operate.
NIST CAISI
Houses the AI Agent Standards Initiative. Coordinates partnerships with research-side institutions (Gray Swan AI, UK AISI) producing the empirical evidence the standards layer rests on. Operates inside NIST’s AI RMF — providing the program-level vocabulary CAISI’s initiatives populate.
- Empirical baseline maintenance — the 250,000-attack floor as living dataset
- Workflow-level standards development at additional layers
- International coordination — the UK AISI pattern is replicable
Not a prediction of which initiatives or publications CAISI produces. Not a forecast of which nations adopt the CAISI model. Not a claim that CAISI displaces NIST’s broader AI work — CAISI is a center within NIST.
Five Eyes Coordination
A coordinated political baseline across six agencies that historically publish independently. Five-domain risk taxonomy + four-domain technical baseline. Shared vocabulary subsequent agency-specific guidance can rest on. Read by Gartner as the procurement floor.
- Agency-specific elaboration — CISA sector overlays, ASD ACSC supplements
- Procurement integration — GSA schedules, UK framework agreements
- Follow-on coordination on specific adversary patterns
Not a prediction of when specific agencies publish elaboration. Not a forecast of which procurement vehicles adopt first. Five Eyes coordination is subject to political variables — bilateral relationships, agency leadership transitions, budget cycles.
Vendor Conformance Attestation
The practitioner-vendor bridge the spec layer enables. Practitioners select implementations against published conformance attestations; vendors compete on breadth and depth of conformance claims; the CSAI attestation registry keeps the conversation coherent across vendors and time.
- Procurement standardization around AARM-conformance attestations
- Vendor differentiation shifting toward what is above the baseline rather than at the baseline
Not a prediction of which vendors lead or lag. Not a forecast of which procurement bodies adopt requirements first. Not a claim that conformance attestation guarantees defensive efficacy — P3’s distinction between conformance and measured strength remains.
The four structures reinforce each other
The trajectory is not four independent venues moving in parallel. Each one creates institutional gravity for the others.
