Two leading AI vendors published security frameworks for agentic systems in the same month. That is a signal worth decoding — not just for what they each recommend, but for where both frameworks stop. Reading them together maps a perimeter. Everything inside that perimeter is vendor responsibility. Everything outside it is yours.
On March 11, 2026, OpenAI published “Designing AI Agents to Resist Prompt Injection” — a detailed account of how the company thinks about defending ChatGPT Atlas and other agentic products against adversarial manipulation. Within the same period, Anthropic published “Our Framework for Developing Safe and Trustworthy Agents” — a principles-based framework for responsible agent development covering autonomy, transparency, privacy, and security. Both publications are technically substantive. Both represent genuine organizational thinking about hard problems. And read together, both make the same structural choice: they define the vendor’s layer and delegate the rest.
Enterprise architects deploying agentic systems on the Model Context Protocol are the intended recipients of that delegation. Understanding what each framework actually says — and where each one stops — is the first governance task of 2026.
What OpenAI’s Framework Actually Says
OpenAI’s core reframe is important and underappreciated. The company argues that prompt injection should not be understood as an input classification problem — the question is not “is this string malicious?” It is a social engineering problem. The most effective real-world attacks, OpenAI observes, increasingly resemble social engineering directed at a human employee: contextually plausible, hard to distinguish from legitimate instructions, designed to exploit the same trust pathways that make agents useful.
This reframe has a direct consequence for defense strategy. If the problem is not identifying a malicious string but resisting misleading or manipulative content in context, then filtering inputs cannot be the primary defense. OpenAI is explicit: AI firewalling — placing an intermediary classification system between the agent and external content — has a fundamental limitation. Identifying a sophisticated adversarial prompt is effectively the same problem as detecting a lie or misinformation, often without access to the context needed to make that determination reliably. Fully developed attacks are not typically caught by such systems.
The implication is a shift in defensive philosophy that enterprise architects need to absorb carefully. Rather than treating injection prevention as an input-classification problem, OpenAI argues the focus should be on designing systems so that the consequences of a successful attack remain constrained. An agent that can only take limited, reversible actions, and must pause for user confirmation before anything consequential, limits the damage an attacker can achieve even if manipulation partially succeeds. The objective of system design, in OpenAI’s framing, is not to produce an agent that can never be deceived. It is one that operates within boundaries tight enough to prevent serious harm when deception occurs.
Prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully solved.
— OpenAI, “Designing AI Agents to Resist Prompt Injection,” March 11, 2026OpenAI pairs this philosophy with source-sink analysis as a practical design tool. An attack requires both a source — a way to influence the system — and a sink — a capability that becomes dangerous in the wrong context. For agentic systems, this typically means combining untrusted external content with an action such as transmitting information to a third party, following a link, or interacting with a tool. The enterprise application of source-sink analysis is direct: map every agent capability that constitutes a sink, and design confirmation gates or blocking rules for every workflow that routes untrusted content toward one.
The share of organizations with dedicated prompt injection defenses deployed, per a VentureBeat survey of 100 technical decision-makers (December 2025). The remaining 65.3% either have not deployed these tools or could not confirm they have — precisely as enterprises are accelerating agentic AI rollout into production systems.
What Anthropic’s Framework Actually Says
Anthropic’s framework is organized around five principles: balancing agent autonomy with human oversight, maintaining transparency about agent actions and reasoning, protecting user privacy, implementing security controls against adversarial manipulation, and governing agent access through the Model Context Protocol. The framework reads as a set of design commitments Anthropic makes for its own products alongside guidance for operators building on those products.
On security, Anthropic describes a multilayer approach: classifiers that detect and guard against prompt injection misuse, a Threat Intelligence team conducting ongoing monitoring, and MCP directory standards requiring that tools added to Anthropic’s reviewed connector directory meet defined security, safety, and compatibility criteria. These are real controls operating at the model and product layer.
The operative mechanism for enterprise governance, however, appears consistently in the operator delegation language. Enterprise administrators can set which connectors users in their organizations can connect to. Operators can configure one-time or permanent access grants. The MCP directory review applies to connectors Anthropic has vetted — not to the full range of MCP servers an enterprise may deploy in its own infrastructure. The framework is honest about this structure. It does not promise controls that Anthropic cannot deliver. But it does transfer the implementation question to the enterprise layer without specifying what that implementation should look like.
Behavioral Architecture
Defense philosophy: Constrain blast radius. Design for survivable failure rather than guaranteed prevention.
Core admission: Prompt injection cannot be eliminated. AI firewalling is insufficient against sophisticated attacks.
Enterprise implication: Minimize agent permissions and require confirmation gates before consequential actions — independently of model-layer defenses.
Source-sink analysisGovernance Principles
Defense philosophy: Classifiers, MCP directory standards, and operator configuration as the layered control model.
Core admission: Operator configuration is where governance is implemented. Enterprise administrators define the access boundary.
Enterprise implication: The controls Anthropic outlines require enterprise operators to build, configure, and maintain the implementation layer.
Operator delegationThe Governance Layer Map
Both frameworks are doing something important — but operating at a specific layer of the agentic AI stack. Understanding where each framework’s coverage ends is more useful to enterprise architects than summarizing what it covers. The stack has three layers, and the governance challenge is distributed unevenly across them.
Model & Product Layer
Adversarial training, prompt injection classifiers, MCP directory review, Atlas confirmation gates, trust policies, and Anthropic Threat Intelligence monitoring. These controls are real, maintained by the vendor, and not configurable by the enterprise. They establish a baseline but cannot be tuned to the specific blast radius profile of a given enterprise deployment.
Vendor responsibilityInjection Resistance + Protocol Governance
Model-level resistance to adversarial manipulation. MCP directory vetting for approved connectors. Product-layer safeguards such as Watch Mode and logged-out mode in Atlas. Published security frameworks communicating the vendor’s design philosophy and the controls the vendor maintains.
What it does not cover: the specific access scope, confirmation gate configuration, audit trail requirements, and trust boundary definitions for a given enterprise’s agentic deployment on MCP infrastructure the vendor does not control.
Harness Layer
Runtime access controls, per-tool permission scoping, confirmation gate design, audit logging, trust boundary configuration, and behavioral monitoring above the model layer. This is what both vendor frameworks implicitly delegate when they describe “operator configuration” or recommend that enterprises constrain blast radius. Neither framework specifies how to build this layer.
The gapBlast Radius Architecture
OpenAI’s blast-radius containment philosophy is correct — but it is a design requirement, not a vendor feature. The enterprise harness must enforce it: scoped tool access, source-sink-aware confirmation gates, behavioral deviation monitoring, and documented operator configuration as a governance artifact under version control.
This is the layer that converts vendor security principles into enforceable enterprise controls. Without it, both vendor frameworks reduce to recommendations the enterprise has acknowledged without implementing.
Substrate Layer
Role-based access controls, identity management, existing data governance policies, observability and logging infrastructure, compliance frameworks, and the policy enforcement mechanisms the enterprise already maintains for non-agentic systems. The harness layer connects to the substrate — it does not replace it.
Enterprise infrastructureGovernance Must Connect, Not Run in Parallel
A common failure pattern is harness governance that operates as a separate system alongside existing enterprise controls rather than as an extension of them. Existing RBAC can anchor agent access boundaries — agents represented as service accounts with scoped roles — but static role assignment breaks down where agents are short-lived, spawn sub-agents, or require context-sensitive access. Production deployments will need ABAC or ReBAC layered on top, short-lived credential infrastructure, and explicit delegation chain policies that standard role models cannot express. Agent audit trails must feed existing SIEM and compliance infrastructure. The agentic access boundary must be consistent with — not independent of — the data classification policies already in force.
Both vendor frameworks assume this substrate exists and is operational. Enterprises where it is incomplete face a compounded governance gap.
The Monoculture Amplifies the Stakes
OpenAI’s blast-radius containment philosophy lands differently when MCP is the protocol. In a standalone agent deployment, constrained blast radius means limiting what a single compromised agent can do. In a monoculture where every agent across every enterprise deployment shares the same protocol architecture and the same structural trust vulnerabilities, blast-radius containment has to be designed at a different scale.
The empirical research on this point is unambiguous. The Breaking the Protocol paper (arXiv:2601.17549), testing 847 attack scenarios across five MCP server implementations, establishes that MCP amplifies attack success rates by 23 to 41 percent compared to non-MCP integrations — rising to 41 percent under cross-server configurations. The protocol does not create new attack categories. It makes existing attacks more effective, more reliable, and harder to detect, simultaneously, across every deployment built on it.
Minimum amplification in attack success rate when MCP is present versus non-MCP integrations, rising to 41% under cross-server configurations. This is a property of the protocol itself — not a per-deployment variable. Blast-radius containment must be designed to absorb this amplification across every MCP-connected agent simultaneously. — Breaking the Protocol, arXiv:2601.17549
The implication for enterprise harness design is concrete. Standard quality monitoring — did the agent complete its task correctly? — is insufficient as an attack signal. The Beyond Max Tokens paper (arXiv:2601.10955) documents an economic denial-of-service attack class that inflates costs up to 658 times baseline while keeping task outcomes correct. The only signal is resource consumption anomaly. If resource consumption is abstracted from the application layer — as it typically is in cloud deployments — that signal may not reach the people who can act on it. The harness must instrument for this. Vendor frameworks do not.
This is where the MCP monoculture makes both vendor frameworks insufficient on their own. OpenAI’s blast-radius containment is sound architectural advice. Anthropic’s operator delegation is a sensible governance structure. Neither framework, however, accounts for the fact that a protocol-level amplification property applies to every deployment simultaneously — which means the harness layer cannot be designed as though each deployment exists in isolation. The enterprise governing five MCP-connected agents is not managing five independent blast radii. It is managing five instances of the same amplified attack surface.
What Enterprise Architects Should Do With This
The two vendor frameworks, read together, produce a set of concrete harness requirements. This is not a checklist. It is a framing for where to concentrate governance energy in light of what the vendors have committed to doing and what they have delegated.
Treat blast-radius containment as an architecture requirement, not a configuration setting. OpenAI’s framing implies that the agent’s capability boundary is the security boundary. Every MCP tool an agent can invoke is a potential sink. The harness must enforce access scoping at deployment — not rely on post-hoc monitoring to identify when scope has been exceeded.
Make operator delegation explicit and version-controlled. Anthropic’s framework places governance in operator configuration. That configuration — which connectors are permitted, what access scopes are granted, what trust boundaries are defined — must be a documented governance artifact under version control, not a one-time setup decision made at deployment and forgotten. Operator configuration is a governance decision. It should be treated with the same rigor as any other enterprise policy.
Design confirmation gates around source-sink pairs. OpenAI’s source-sink model provides a concrete taxonomy for where confirmation gates belong: any workflow that routes untrusted external content toward a consequential action capability. Map the sinks first. Then map the workflows that can route content to each sink. Then design the gate or blocking rule. This is tractable engineering if done at harness design time. It is extremely difficult to retrofit.
Instrument for resource anomaly, not just behavioral deviation. Because MCP-layer economic attacks maintain correct task outcomes, behavioral monitoring alone will miss them. The harness must log and alert on token consumption anomaly, tool call chain length, and latency outliers as security signals — not just operational metrics. This requires feeding harness telemetry into the enterprise’s existing SIEM infrastructure rather than treating it as a separate operational dashboard.
Connect harness policy to substrate controls — do not run them in parallel. Existing RBAC can anchor agent access boundaries, but standard role assignment was not designed for the identity patterns agents introduce: short-lived credentials, multi-hop delegation chains where Agent A spawns Agent B under attenuated trust, and context-sensitive access where the same agent session requires different permissions at different points in a workflow. ABAC or ReBAC layered on top of existing role infrastructure is the production-grade answer — policy engines like OPA, Kyverno, and OpenFGA exist and are mature, but wiring them to agent workflows is new integration work most enterprises have not yet done. Agent audit trails must be SIEM-consumable. The access boundary the harness enforces must be consistent with the data classification policies the enterprise already maintains. Parallel governance systems create gap surface. The harness extends substrate governance into the agentic layer — it does not replace it.
The governance gap is not the result of vendor negligence. Both frameworks are technically sound and genuinely useful. The gap is structural: vendors build at the model and product layer; enterprises deploy at the system layer. What OpenAI and Anthropic published this month establishes what vendors will do and what they expect operators to handle. Enterprise architects reading these frameworks need to hear what isn’t said as clearly as what is — because what isn’t said is their responsibility to build.
