Blast Radius Is Not a Post-Incident Metric

Post 1 of this series established why agentic AI attack surfaces compound rather than add, and introduced the concept of dynamic blast radius. Post 2 mapped all six attack surface dimensions with full empirical rigor, including the honest accounting of where current defenses fall short. This closing post makes the prescriptive argument: blast radius must be assessed before deployment, across all six dimensions simultaneously, using a framework designed for systems whose damage scope expands at runtime. It closes with an honest inventory of what remains genuinely unsolved — because the field is early, and knowing the boundary of current knowledge is part of what practitioners need most.

In every security incident involving an agentic AI system that has been publicly analyzed, the same pattern appears: the blast radius was larger than anyone expected, and it was larger because it expanded during the incident rather than being fixed at the point of initial compromise. A credential theft that would have been contained to one service account in a traditional deployment propagated through three agent delegation hops before detection. A poisoned MCP tool description that would have affected one session instead persisted in shared memory across a fleet. A container escape that would have reached the host filesystem instead reached the Docker daemon and spawned new workloads. In each case, the damage scope was not a property of the vulnerability. It was a property of what the agent did after the vulnerability was exploited — reasoning, delegating, persisting, and acting autonomously at machine speed.

This is the operational problem that blast radius as a post-incident metric cannot solve. By the time you are measuring blast radius after a compromise, the autonomous expansion has already occurred. The containment question must be answered before deployment — not as an afterthought, and not dimension by dimension, but as a simultaneous assessment across all six attack surfaces identified in Post 2.

Why Blast Radius Requires a New Definition

The CVE era produced a workable definition of blast radius: the scope of damage a successful compromise can produce, bounded by the access granted to the compromised identity or process at provisioning time. This definition is useful because it is computable. Given a network diagram, an IAM configuration, and a list of compromised credentials, a security team can enumerate what a successful attack can reach. The blast radius is a fact about the system’s configuration, not about the system’s behavior during operation.

Agentic AI renders this definition insufficient in three ways. First, blast radius is not bounded at provisioning because agents make runtime decisions about what to access next. An agent’s blast radius at the moment of compromise is not its blast radius sixty seconds later, after it has reasoned about its task, retrieved additional context, and called three additional tools. Second, blast radius propagates through trust chains in ways that are not captured in any IAM configuration — an agent that delegates to sub-agents extends its blast radius into every resource those sub-agents can reach, through delegation hops that are not recorded in standard access logs. Third, blast radius persists across sessions through memory stores that standard incident response processes do not examine or sanitize.

A Working Definition for Agentic Systems

Blast radius in agentic AI is the maximum scope of damage that a successful attack against a given agent can produce, accounting for: (1) the resources the agent can directly access at the moment of compromise, (2) the resources it can reach through delegation to sub-agents and tool invocations it initiates autonomously, (3) the downstream systems and agents that consume its outputs and may act on corrupted or malicious content, and (4) the future sessions that may be affected by memory or context the compromised agent writes.

This definition makes blast radius dynamic: it is not a static property of the agent’s configuration but a function of the agent’s behavior, which is non-deterministic and can be adversarially influenced. The security implication is that blast radius must be bounded structurally — through identity, capability restriction, and containment architecture — rather than inferred from configuration alone.

The Four-Factor Assessment Framework

Before any agentic AI system is deployed in a production environment, practitioners need a structured method for assessing the blast radius potential of each agent and each agent configuration. The four-factor framework provides this assessment — not as a compliance checklist but as a risk-prioritization tool that identifies which agents represent the highest-priority containment requirements regardless of the specific attack scenario.

The framework was synthesized from Repello AI’s AI Attack Surface Management research, the CSA Agentic Trust Framework maturity model, and the HiddenLayer 2026 AI Threat Landscape operational findings. It assesses four properties, each of which independently constitutes an elevated risk factor when high. An agent where any one factor is high is a priority for containment investment. An agent where multiple factors are high is the highest-priority risk in any deployment.

Data Access Scope

What data can the agent read — which databases, documents, user sessions, connected services, memory stores, and retrieval indexes? An agent with access to a single product FAQ has a contained data blast radius. An agent with access to a CRM, email archive, financial reporting system, and internal communications platform does not. Least-privilege data access is the most direct blast radius reduction lever available before deployment. OWASP LLM06:2025 (Excessive Agency) classifies over-permissioned data access as a top-ten risk precisely because data scope directly determines what a compromised agent can exfiltrate, corrupt, or weaponize.

Tool Execution Scope

What actions can the agent take — read-only versus write access, internal APIs versus external service connections, scoped API keys versus broad credentials, code execution versus data retrieval? An agent that can only read from a database cannot exfiltrate data through write operations. An agent that can send email can be induced to exfiltrate data through those emails. An agent that can execute code in a shared container can, under adverse conditions, affect that container’s co-tenants. The tool execution scope is the action surface: it determines what an attacker who controls the agent can do in the world beyond the AI system itself.

Downstream Consumption Breadth

How many other agents, systems, or users consume this agent’s outputs and may act on them? An agent whose outputs are reviewed by a human before any action is taken has a contained downstream blast radius. An agent whose outputs are directly consumed by five downstream agents, three automated workflows, and a shared knowledge base does not. This factor captures the propagation dimension of blast radius — the compounding multiplier introduced when a compromised agent’s malicious or corrupted output becomes trusted input for other reasoning systems. The Trust Paradox research (arXiv:2510.18563) demonstrated this empirically: a single poisoned agent degraded 87% of downstream decision-making within four hours across connected agent populations.

Persistence Mechanisms

Does the agent write to memory stores, shared state, configuration files, or other persistent artifacts that will influence future sessions or other agents? An agent that processes a single task and retains nothing has blast radius bounded to that session. An agent that writes to a shared vector database, updates a persistent user profile, modifies a shared workflow configuration, or stores episodic memory that future sessions retrieve has blast radius that extends forward in time indefinitely. This is the dimension that transforms prompt injection from a transient attack into a stateful campaign — and it is the dimension most consistently overlooked in initial deployment risk assessments.

The practical output of the four-factor assessment is a prioritized containment agenda. Agents scoring high on all four factors are the highest-priority risks and should receive the full containment treatment described in the sections below before production deployment. Agents scoring high on one or two factors require targeted controls. Agents scoring low on all four can be deployed with standard least-privilege practices and behavioral monitoring.

Blast Radius Before Deployment: The Design Disciplines

Post 2 covered what current defenses can and cannot achieve for each of the six attack surface dimensions. This section focuses on the structural design decisions that determine blast radius before any specific attack occurs — the choices made at architecture time that either make containment possible or make it impossible.

Identity as the primary containment boundary

In agentic systems, identity is not just an authentication mechanism — it is the primary structural constraint on blast radius. What an agent’s identity has access to is what a compromised agent can reach. This reframes identity architecture from a compliance requirement into a security design discipline with direct blast radius consequences.

The principle is straightforward: agents should be provisioned with purpose-bound identities that grant only the access required for their specific assigned function, with no ambient authority that persists beyond task completion. An agent designed to schedule calendar meetings should have write access only to the calendar API — not to the corporate email server, not to the CRM, and not to any other system, regardless of organizational convenience. The moment any of those additional accesses are granted “for future use,” the blast radius of that agent’s compromise expands to include all of them.

The implementation requires moving away from static, broad service accounts toward just-in-time, task-scoped credentials. Credentials generated dynamically for a specific task and expiring immediately upon completion have near-zero value if compromised — the window of exploitation is measured in seconds. Policy engines such as OPA and AWS Cedar with AuthZEN integration enable per-action authorization evaluation at runtime, allowing access decisions to incorporate not just identity but task context, data sensitivity, and behavioral baseline. This is not static role-based access control dressed up with new terminology. It is a fundamentally different authorization model where the grant is the task, not the role.

Approximate cost per sandbox escape attempt in SandboxEscapeBench (arXiv:2603.02277), at current frontier model pricing. At this cost structure, an attacker can attempt thousands of escape scenarios in parallel with compounding probability of success. Containment architecture that relies on the model’s unwillingness to attempt escape rather than structural impossibility of success is not containment — it is probability management at an adversary-favorable cost ratio.

Circuit breakers: the human-in-the-loop trigger

Not all actions an agent might take carry equal consequence. A well-designed agentic system recognizes this and implements circuit breakers — explicit thresholds at which autonomous execution stops and human review is required before proceeding. The circuit breaker is not a general skepticism about agent capability; it is a specific architectural decision about which action types carry blast radius large enough to warrant the latency cost of human review.

The Stellar Cyber framing is the clearest articulation of the principle: autonomous remediation needs circuit breakers — if the blast radius exceeds a threshold, humans decide, not agents. The practical design question is where to place those thresholds. The answer is not universal; it depends on the deployment context, the regulatory environment, and the risk tolerance of the organization. But several action categories are near-universal candidates for circuit breaker placement regardless of context.

Action Category	Blast Radius Rationale	Threshold Recommendation
Financial transactions above threshold	Irreversible, quantifiable damage scope	Human approval required
Bulk data deletion or modification	Potentially irreversible, broad data scope	Human approval required
External communication to novel recipients	Exfiltration path, reputational risk	Human approval required
Access control or permission changes	Expands future blast radius for all agents	Human approval required
Memory writes from untrusted sources	Cross-session persistence, sleeper attack risk	Human approval required
Tool invocations to newly discovered servers	Supply chain injection risk	Allowlist verification required
Sub-agent spawning beyond defined topology	Trust chain expansion, delegation depth	Scope confirmation required
Code execution in shared environments	Containment failure risk, co-tenant exposure	Sandboxed execution enforced

The circuit breaker is not a failure of trust in the agent. It is an acknowledgment that some action categories carry blast radius large enough that the cost of the latency introduced by human review is lower than the cost of autonomous execution under adverse conditions. The AWS Kiro incident — where an agentic coding tool with over-permissioned access caused a production outage — is the concrete illustration of what happens when this judgment is absent. The problem was not the agent’s reasoning capability. The problem was that no circuit breaker existed at the point where the action’s blast radius made autonomous execution inadvisable.

Runtime least privilege: not a configuration, a control plane

The traditional implementation of least privilege is a configuration decision: at deployment time, an engineer assigns the minimum permissions a service account requires and does not grant more. This approach fails in agentic systems because agents’ access requirements vary by task, by context, and by the intermediate outputs of multi-step reasoning. Broad permissions are granted “for convenience” during development and never narrowed because narrowing them would break functionality that someone, somewhere, depends on. The result is what Strata Identity’s research calls “permanent privilege” — static access that ignores task context, data sensitivity, and behavioral intent entirely.

Runtime least privilege replaces this with a control plane approach: authorization is evaluated and enforced per action, per request, incorporating the full context of what the agent is doing and why. Access is downscoped to the minimum required for the specific action being taken, minted as a short-lived token, and expires automatically when the action completes. The agent cannot accumulate permissions across tasks. Each task begins from a fresh grant, narrowly scoped, time-bounded, and auditable.

Static Least Privilege

Permissions set at deployment

Permissions are assigned once at deployment time based on anticipated needs. Broad enough to cover all expected tasks — and therefore broader than any single task requires. Stable across all executions regardless of task context, data sensitivity, or current risk state. Cannot distinguish between a safe action and a dangerous one made with the same permission set.

Result: an agent that can complete the job, and quietly expand the job into adjacent systems it was never intended to touch.

Static · Permanent · Blast-Radius-Additive

Runtime Least Privilege

Permissions evaluated per action

Permissions are calculated at runtime, incorporating task identity, data classification, behavioral baseline, and risk context. Access is minted as a short-lived credential for the specific action being taken and expires immediately on completion. Policy-as-code in a centralized control plane — not embedded in agent prompts or application code — is the enforcement mechanism.

Result: an agent that can complete the job, and cannot expand beyond it regardless of adversarial instruction or reasoning error.

Dynamic · Ephemeral · Blast-Radius-Contained

Containment architecture: sandbox discipline

For agents with code execution capability, the choice of sandbox architecture is a direct blast radius decision. As SandboxEscapeBench established, Docker containers sharing a host kernel represent a blast radius boundary that frontier models can breach for approximately $1 per attempt, on a log-linear cost curve that improves with each generation of model capability. Organizations that sandbox code-executing agents in standard Docker containers should treat that as a temporary risk posture, not a permanent containment solution.

MicroVMs — Firecracker, Kata Containers — provide the structural answer: each workload receives a dedicated kernel, eliminating the shared-kernel vulnerability class that represents the primary escape path. The 150ms additional boot time this introduces is the correct trade against the alternative. For organizations where MicroVM overhead is not acceptable, the minimum viable containment stack requires dropping all unnecessary Linux capabilities at provisioning, applying restrictive seccomp profiles, never mounting Docker sockets in agent containers, enforcing network isolation with explicit egress allowlists, and treating read-only root filesystems as standard rather than optional.

The critical principle, stated plainly: telling a model not to escape is not enforcement. Instruction-level containment (“you must not access host resources”) relies on the model’s willingness to comply with an instruction that an adversary can override with a sufficiently compelling adversarial prompt. System-level enforcement through capability restriction, kernel isolation, and policy-as-code does not rely on model compliance. It makes the restricted action structurally impossible within the model’s execution environment, regardless of what the model reasons about wanting to do.

What Remains Genuinely Unsolved

Intellectual honesty about the limits of current knowledge is not a weakness in security communication — it is the property that makes security research credible to a practitioner audience that has seen enough vendor overclaiming to recognize it immediately. This series has documented what the research establishes. This section documents where the research currently runs out.

The instruction-data confusion problem

The fundamental reason prompt injection cannot be fully remediated is that LLMs process retrieved memory, tool results, system prompts, and user instructions as the same class of tokens — there is no structural separation between trusted instructions and untrusted data. This is not a design flaw that will be corrected in the next model version. It is an architectural property of transformer-based language models as currently constructed. Every defense against prompt injection is statistical, not structural: it reduces the probability that adversarial instructions will be followed, but cannot reduce it to zero across all possible adversarial inputs. NCC Group’s AI Red Team reports a 100% adaptive bypass rate against all known guardrails. The MCPSec extension reduces MCP attack success from 52.8% to 12.4% — a significant improvement that still leaves meaningful residual risk.

Multi-agent cascade characterization

The empirical research has established that multi-agent trust chains amplify blast radius through cascade propagation. What it has not established is a reliable model for predicting how far and how fast propagation occurs in realistic production deployments — as opposed to the controlled research environments where the measurements have been taken. Organizations cannot currently compute a principled upper bound on blast radius in multi-agent deployments because the interaction effects between agent topology, orchestration framework, model backend, and adversarial input characteristics are insufficiently characterized. The 87% downstream decision degradation figure and the logistic growth dynamics of prompt infection are real measurements — but their generalizability to specific production architectures is not yet established.

Authorization in agentic contexts

Authentication for agentic systems is converging on workable approaches — AIMS (draft-klrc-aiagent-auth-00), WIMSE, Transaction Tokens, and SPIFFE together provide a composable framework that solves the workload identity and short-lived credential problems. Authorization — the per-action evaluation of whether a specific agent action aligns with the originating user’s intent — remains fundamentally open. Policy engines can evaluate context, data classification, and permission scope. They cannot evaluate semantic intent: whether the agent’s proposed action is what the user meant when they initiated the task. This is a standards gap and a research gap simultaneously, and it represents the deepest unsolved problem in agentic AI security identity architecture.

Memory isolation at scale

Tenant isolation in vector databases prevents cross-user memory poisoning in multi-tenant deployments. Temporal trust decay reduces the persistence of injected content. Memory provenance tracking provides forensic capability after the fact. None of these controls address the core problem: because LLMs process retrieved memory as the same kind of thing as trusted instructions, there is no complete structural mitigation for the scenario where retrieved memory contains adversarial instructions. Write-path validation requiring human approval for memory writes from untrusted sources is the most defensible approach currently available — and it introduces the latency and friction costs that made automatic memory accumulation attractive in the first place.

The Honest Practitioner Summary

Approximately 70% of the agentic threat landscape maps to recognizable security patterns where existing controls provide meaningful partial protection. The four-factor assessment, circuit breaker design, runtime least privilege, and MicroVM containment are all implementable today and all meaningfully reduce blast radius. The remaining 30% — the instruction-data confusion problem, multi-agent cascade characterization, semantic authorization, and memory isolation under adversarial conditions — represents genuinely novel risk that current security architecture was not designed to address and that current research has not yet solved. Deploying agentic AI responsibly means implementing what works, being honest about what does not, and building incident response capability that assumes residual risk across all six dimensions will eventually be exploited.

Building a Containment-First Posture

The series has argued, across three posts, that agentic AI security requires a different mental model than the CVE era provided. The threat surfaces are different. The relationships between them are different. Blast radius behaves differently. The practitioner who approaches agentic AI security by appending new vulnerability categories to an existing risk register is not wrong — but they are using a tool designed for a different problem.

The containment-first posture that this series argues for has four operational properties. It treats blast radius as a design constraint evaluated before deployment, not a forensic measurement taken after incidents. It assesses all six attack surface dimensions simultaneously, because the compounding dynamic means addressing them sequentially leaves the multiplicative risk unaddressed. It distinguishes between controls that structurally prevent actions and controls that probabilistically discourage them, prioritizing the former and being honest about the latter’s limits. And it maintains an honest inventory of what remains genuinely unsolved — because that inventory is what drives the research agenda and the investment decisions that will eventually close the gap.

The organizations that will fare best are not those that have eliminated all agentic AI risk — no such organization exists yet. They are those that know precisely where their residual risk lives, have bounded it as tightly as current architecture allows, and have built the observability to detect when it is exploited faster than the autonomous blast radius expansion can outpace their response.

— Synthesis from HiddenLayer 2026 AI Threat Landscape Report; Repello AI Attack Surface Management; CSA Agentic Trust Framework; SandboxEscapeBench, arXiv:2603.02277

The field will not stand still. NIST’s AI agent standards initiative, the IETF OAuth drafts for agentic systems, OWASP’s expanding framework coverage, and the continuing empirical research from arXiv will all close portions of the gap documented here. The log-linear compute scaling of sandbox escape success will continue to improve with model capability — and so will the models’ utility across the rest of the agentic AI use case landscape. Security practice will need to keep pace with both directions of that dynamic.

Closing Argument

Blast radius is not a number you find after an incident. It is a boundary you draw before one — through identity architecture, circuit breaker design, runtime privilege control, and containment infrastructure that does not rely on model compliance. If that boundary is not drawn deliberately, it will be drawn by whatever the agent reasons its way into doing under adverse conditions. At machine speed. Before anyone can intervene.

Fault Lines: The Hidden Structural Risks of Agentic AI · Three-Part Series

Post 1 · Previous Why Agentic AI Doesn’t Add Risk, It Multiplies It Post 2 · Previous Mapping the Agentic Attack Surface: Six Dimensions, No Perimeter

Post 3 · Now Reading Blast Radius Is Not a Post-Incident Metric

Blast radius in agentic AI cannot be read off a network diagram — it expands autonomously at runtime through delegation, memory persistence, and tool invocation chains. The only answer is containment as a first-class design discipline: assess all four blast radius factors before deployment, place circuit breakers at high-consequence action thresholds, enforce least privilege at runtime through a control plane rather than static configuration, and use structural enforcement — not model instruction — as the final containment boundary.

→ Circuit breaker — An explicit threshold at which autonomous agent execution stops and human review is required, set by action category and consequence magnitude.
→ Runtime least privilege — Authorization evaluated per action at runtime, incorporating task context and risk — not set statically at deployment.
→ Identity as containment — In agentic systems, the agent’s identity and its scope of access is the primary structural boundary on blast radius. Tight identity scoping is the most direct pre-deployment control.
→ Structural enforcement — Controls that make restricted actions architecturally impossible, as opposed to probabilistic controls that rely on model compliance.

Blast Radius Is Not a Post-Incident Metric

Why Blast Radius Requires a New Definition

A Working Definition for Agentic Systems

The Four-Factor Assessment Framework

Blast Radius Before Deployment: The Design Disciplines

Identity as the primary containment boundary

Circuit breakers: the human-in-the-loop trigger

Runtime least privilege: not a configuration, a control plane

Permissions set at deployment

Permissions evaluated per action

Containment architecture: sandbox discipline

What Remains Genuinely Unsolved

The instruction-data confusion problem

Multi-agent cascade characterization

Authorization in agentic contexts

Memory isolation at scale

The Honest Practitioner Summary

Building a Containment-First Posture

Coming Next: Series 5 — The Containment Problem

Like this:

Related

Blast Radius Is Not a Post-Incident Metric

Why Blast Radius Requires a New Definition

A Working Definition for Agentic Systems

The Four-Factor Assessment Framework

Blast Radius Before Deployment: The Design Disciplines

Identity as the primary containment boundary

Circuit breakers: the human-in-the-loop trigger

Runtime least privilege: not a configuration, a control plane

Permissions set at deployment

Permissions evaluated per action

Containment architecture: sandbox discipline

What Remains Genuinely Unsolved

The instruction-data confusion problem

Multi-agent cascade characterization

Authorization in agentic contexts

Memory isolation at scale

The Honest Practitioner Summary

Building a Containment-First Posture

Coming Next: Series 5 — The Containment Problem

Share this:

Like this:

Related