Why Every Security Failure Is a Harness Failure — Luminity Digital
The Great Compression · Dispatch 04 · May 2026
Agentic AI Infrastructure

Why Every Security Failure Is a Harness Failure

The harness is both the problem and the solution. The provider controlling the vertically integrated runtime has the highest-fidelity visibility and the strongest position to enforce coherent runtime governance. The Great Compression is the structural precondition for agentic AI to reach scale.

May 2026 Tom M. Gomez Luminity Digital 9 Min Read
Three dispatches have built the architectural reading of the Great Compression. Dispatch 01 named the substrate consolidation as a structural event, not a market cycle. Dispatch 02 traced the absorption signals in vendor filings and the OpenAI memo. Dispatch 03 identified the three vectors by which the harness layer is being absorbed — substrate-side absorption, model-provider acquisition, provider-native displacement — and named the published architectural posture from the parties with the most production authority as the receipt the forecast was waiting for. This dispatch closes the architectural argument: why every agentic security failure is a harness failure, and why securing the harness requires a structure only the vertically integrated runtime can produce.

The model is the cognitive engine. The harness is what turns cognition into action. Every documented agentic security failure of the past two years — tool-call abuse, indirect prompt injection, goal hijacking, cross-agent execution path exploitation, transitive authorization escalation — happens not inside the model’s reasoning but in the layer that operationalizes that reasoning against tools, data, and other agents. The harness is where the agent’s autonomy becomes consequence. Every multiplication point is also a failure point.

The dominant public framing still treats the model as the locus of risk. That framing produces a security architecture organized around model behavior — red-teaming the model, scoring outputs against harm taxonomies. None of those measures address the failure modes that actually matter at runtime. The failure modes do not live in the model’s outputs. They live in what happens after the model decides.

The Harness Is Where the Failures Live

One scoping note first. “Harness” in this dispatch means the runtime operationalization layer — tool-call orchestration, retrieval pipelines, memory subsystems, authorization placement, agent-to-agent coordination, goal representation. It does not mean identity systems, container isolation, kernel security, or human insider threat. Those failure surfaces exist and matter. They are not the subject here. The claim is narrower than the title suggests: every agentic security failure surfaces in the harness as a runtime governance failure.

With that scope, the catalogue is consistent. Tool-call abuse is a harness orchestration failure — the harness routes calls without verifying alignment between the call and the authorized goal. Indirect prompt injection, mapped in Greshake et al. (arXiv:2302.12173), lives at the retrieval-to-context seam the harness controls. Cross-agent execution paths, formalized in Triedman et al. (arXiv:2503.12188), exploit harness coordination protocols. The MCP and A2A protocol layers, where these interactions are now being standardized, are where the failures are also being structurally resolved. The common pattern is that none of these is a model failure. The model’s reasoning can be entirely well-formed and the failure still occurs.

The Architectural Reframe

The model produces decisions. The harness produces actions. Agentic security failures live where intent becomes operation — in tool-call orchestration, retrieval, memory, authorization placement, agent-to-agent coordination, goal representation. These are harness properties, not model properties.

Scaffolding, Harness, Protocol — The Maturation Arc

What is being called “the harness” today is not a stable category. It is substrate in motion. Raw scaffolding hardens into a recognized harness as common patterns are extracted and standardized. The harness then hardens into protocol, as the patterns that survive get encoded into specifications the substrate enforces by construction. HTTP went through this arc: raw protocol, then standardized verbs, then TLS, then CA trust chains the substrate enforces without application awareness. SQL went through it: raw query construction, then parameterized binding, then injection-resistant interfaces by construction. OAuth went through it.

The agentic harness is somewhere between raw scaffolding and recognized harness today. MCP is the first credible protocol-layer instance — a specification for how tools are exposed to agents that the substrate can enforce. A2A’s signed agent cards are a second early instance — a protocol-layer trust mechanism that does not depend on application-layer judgment. Each shift moves the security guarantee from the application’s responsibility to the substrate’s enforcement, which means the substrate absorbs findings as new guarantees. The half-life of current red-team work against agentic scaffolding is shorter than the procurement cycle that produced it.

Autonomy and Determinism Are the Same Dial

The architectural truth that standards bodies have not yet absorbed: security and autonomy in agentic systems are not orthogonal axes. They are the same dial seen from two sides. Every protocol-level enforcement that closes a failure mode also narrows the agent’s action space. Every architectural security guarantee is, by the same mechanism, a capability constraint. This is not a tradeoff that can be optimized away by better engineering. The harness is what multiplies the model’s autonomy into consequence. Constraining the multiplier — which is what protocol-layer security does — constrains the autonomy by exactly the same mechanism that secures it.

VeriGuard (arXiv:2510.05156) is the cleanest empirical demonstration of the dial. Its architectural enforcement layer achieves a zero percent attack success rate against a strong adversarial benchmark while preserving high task success rate — but explicitly by trading autonomy for verified policy compliance. The agent operates in a narrower action space because policy enforcement is architecturally binding rather than behaviorally suggested. Winston SMT (arXiv:2603.20449) demonstrates the sibling pattern in a different verification regime. The security guarantee and the capability constraint are the same operation, viewed from two sides.

Additive Security Framing
  • Treats controls as overlays applied on top of capability
  • Assumes security and autonomy are independent axes
  • Produces control catalogues assembled by enumeration
  • Optimizes each control against its own failure mode
  • Cannot account for the harness as autonomy multiplier
  • The framing standards bodies inherited from pre-agentic systems
Structural Security Framing
  • Treats the harness as the locus of both autonomy and failure
  • Recognizes security and autonomy as one dial, two views
  • Encodes guarantees at the protocol layer, substrate-enforced
  • Accepts capability constraint as the security mechanism
  • Reads VeriGuard’s 0% ASR with high TSR as the demonstration
  • The framing the architecture imposes on anyone reading it honestly

The Tradeoff Is a Runtime Calculation

If autonomy and determinism are the same dial, setting it correctly is a continuous calculation, not a one-time architectural decision. The right setting depends on the live composition graph of the agent’s tool manifest, the structure of its authorization model, the precision of its goal representation, the current state of its memory, the trust posture of the agents it is coordinating with. None of those inputs are static. The tradeoff is therefore a runtime calculation, performed at the cadence of deployment against the specific configuration that is live in the moment.

Who can do the calculation matters. A human red team probing surface taxonomy operates at a cadence structurally incompatible with the calculation — by the time a finding is documented, the harness has changed. Standards bodies write static controls against a state that no longer exists by publication. Independent vendors with architectural access can do snapshots but not continuous calculation. The trifecta — red teams, standards bodies, independent vendors — each contributes structurally distinct work. None of the three can complete the harness governance calculation alone. CLTR’s Scheming in the Wild (arXiv:2604.09104) makes the empirical case for the runtime cadence: behavioral divergence accumulates at runtime against a configuration that is itself evolving at runtime.

The party best positioned to perform the calculation is the one with the highest-fidelity visibility into the live runtime and the cadence to operate at its rate of change. That party is the provider running the vertically integrated stack — reasoning over its own composition graph, tool manifest, authorization posture, and goal representation at the cadence at which those configurations actually change. The fourth actor completes the quartet. The harness symphony requires all four voices: red teams to find what breaks, standards bodies to codify what’s found, independent vendors to audit snapshots, the integrated runtime provider to govern continuously. This is not a claim that the model provider is the most trusted assessor. It is a claim about which voice the ensemble structurally requires to be complete — what prior Luminity analysis named Position 4: the runtime architectural assessment role that emerges by elimination, not by argument.

The Quartet, Named Precisely

The autonomy-determinism tradeoff is a runtime calculation against the live harness configuration. Three actors — red teams, standards bodies, independent vendors — form the governance trifecta, each contributing structurally distinct work. None can reach the runtime cadence alone. The provider controlling the vertically integrated runtime is the fourth voice — the conductor of the choreographed harness symphony, with the visibility and cadence the calculation structurally requires. The harness symphony requires all four.

The Great Compression Is the Substrate the Quartet Needs

The strategic question is why the runtime calculation cannot be performed against a fragmented stack. The answer is structural. Fragmented orchestration breaks trace continuity — the harness cannot reconstruct what an agent did across vendors when each vendor logs against its own schema. Fragmented memory breaks policy persistence — governance decisions made at one tool boundary do not persist across the next. Fragmented tooling breaks objective fidelity — the agent’s goal representation is reinterpreted at every coordination seam. The runtime calculation requires all three to hold simultaneously. Fragmentation breaks all three. Integrated runtime stacks win not because the substrate vendors prefer them but because autonomous systems economically favor them once the calculation is what the architecture requires.

Beneath the fragmentation problem is a deeper one. Stateless reasoning cannot govern persistent autonomous systems. Security at the harness layer is inseparable from trace persistence, runtime memory, and governance continuity — because what the harness must enforce is not a single decision but a trajectory of decisions across a deployment that outlives any individual model invocation. Persistence is the connective tissue. Without it, governance is intermittent, traces are discontinuous, and policy resets at every seam. The vertically integrated runtime is the structure that gives persistence enough surface to operate.

This conclusion did not arrive from a position. It arrived from a trek. Twelve series of Luminity analysis, 100+ peer-reviewed papers from the agentic AI security corpus, multi-quarter synthesis of provider architectures and harness ontology — that body of empirical evaluation is what produces the vertical-stack reading. The reader can disagree with the conclusion. The conclusion was reached the long way.

The framework-skeptical posture published in 2025 and 2026 by the parties running the largest agentic deployments — Anthropic’s direct guidance against framework abstraction in production agent systems, OpenAI’s parallel position that its Agents SDK is engineered with deliberately few abstractions and pluggable observability — is the same architectural diagnosis surfacing at the design layer. Two providers with no commercial incentive to converge arrived at the same diagnosis from opposite competitive positions. The convergence is not coordination. It is independent recognition of what the architecture requires.

The Compression Is the Precondition

The questions that have dominated the public conversation about model providers — governance, independence, vendor lock-in, the proper role of standards bodies — are downstream of a more fundamental question. Can agents be secured at all? The path to yes runs through the architectural layer. The architectural layer requires the Quartet’s fourth voice — the runtime conductor with the visibility and cadence only the integrated runtime can provide. The Great Compression is the structural precondition for agentic AI to reach scale.

The McKinsey State of AI data for late 2025 sits exactly where this argument predicts. Eighty-eight percent of enterprises adopting AI, seven percent reporting full scaling, thirty-nine percent realizing EBIT impact at the firm level. McKinsey’s April 2026 follow-on reports fewer than ten percent of agentic programs reaching meaningful scale. The gap between adoption and scaling is the security floor that has not yet been built. Once it is, governance and lock-in concerns will not disappear — but they will be answered as second-order questions about a structure the architecture has already chosen. The conversation in the enterprise will not be whether to use the vertical stack. It will be which conductor’s score to sign.

The Architectural Landing

The ensemble has been seated at the harness. The Quartet will now choreograph the architectural patterns for secure agentic AI deployments — each provider conducting a distinct score, each enterprise improvising within it.

The Dispatch Conclusion

Every agentic security failure surfaces in the harness as a runtime governance failure. The trifecta of red teams, standards bodies, and independent vendors contributes structurally distinct work — and cannot complete the runtime calculation alone. The Quartet’s fourth voice — the integrated runtime as choreographic conductor — provides the visibility and cadence the harness symphony structurally requires. The Great Compression is the precondition. Every other question is downstream.

If this dispatch raised questions worth working through for your environment

The architectural case has procurement consequences. If it is useful to think through what the Compression argument means for your deployment posture, the conversation starts here.

Schedule a Conversation →
The Great Compression  ·  Dispatch Series
Dispatch 01 · Published The Great Compression Has a Product Now
Dispatch 02 · Published The Great Compression Has a Playbook Now
Dispatch 03 · Published Three Vectors and a Verdict
Dispatch 04 · Now Reading Why Every Security Failure Is a Harness Failure
References & Sources

Share this:

Like this:

Like Loading…