The monitoring infrastructure enterprises have built for agentic AI is, by and large, boundary monitoring. Detect when an unauthorized tool is called. Detect when a request crosses a permission boundary. Detect when content entering the context window matches a known-harmful pattern. Detect when data egresses through an unapproved channel. These are all correct controls for the threats they were designed to catch. The ISACA framing states the problem plainly: when an agent acts, it uses real credentials and approved interfaces. When it accesses sensitive data, the request is treated as valid. When it executes commands, it does so within its granted permissions. There is no clear distinction between normal and malicious behavior at the agentic control layer.
That last sentence is the core of the detection problem. Traditional security monitoring detects deviation from a normal baseline. It detects the unauthorized call, the out-of-scope access, the known-bad signature. A goal-hijacked agent makes authorized calls, accesses in-scope data, and produces no signature. Each individual action is permissioned. The attack is in the trajectory — in the accumulated direction of a sequence of individually permitted actions toward an objective the agent was not authorized to pursue.
Detecting that is not a boundary problem. It is a trajectory problem. And the instrumentation for trajectory problems is not what the field has spent the past three years building.
Why Stateless Monitoring Cannot Address It
The safety gap in current monitoring architecture has a precise technical name. Safety guardrails remain largely stateless, treating multi-turn interactions as a series of disconnected events. Albrethsen, Datta, Kumar, and Rajasekar (arXiv:2602.16935) frame the failure mode directly: adversarial tactics like Crescendo and ActorAttack slowly bleed malicious intent across turn boundaries to bypass stateless filters. The attack is not contained in any individual turn — it accumulates across the conversation as the agent’s working understanding of its objective is gradually redirected. A filter applied to each turn independently sees nothing suspicious. The attack is in the relationship between turns, not in any single turn.
Compared to 0.67 for both Llama-Prompt-Guard-2 and Granite-Guardian. DeepContext uses a recurrent architecture to propagate a hidden state across the conversation, capturing the incremental accumulation of risk that stateless models overlook. Sub-20ms inference on a T4 GPU. This is the direction — modeling the sequential evolution of intent rather than scoring isolated events.
DeepContext demonstrates what stateful intent tracking looks like in practice: a recurrent neural network that ingests a sequence of turn-level embeddings and propagates a hidden state across the conversation, maintaining a running representation of the trajectory of intent rather than evaluating each turn in isolation. It monitors what the paper calls the “intent distance” — the accumulated divergence between the agent’s current working objective and the objective it was initialized with. When that distance crosses a threshold, the alert fires. Not because any single turn was suspicious, but because the trajectory is wrong.
This is the architectural principle that the goal-oriented detection problem requires: not event scoring, but trajectory modeling. The two problems require different instrumentation, different data structures, and different inference architectures. Building better event-scoring systems does not produce trajectory modeling capability.
The Pattern-Matching / Intent-Verification Distinction
The distinction between what current monitoring does and what goal-layer detection requires can be stated precisely as a contrast between two detection paradigms.
- Scores individual events against known-bad signatures
- Stateless — each event evaluated independently
- Fires on deviation from permitted behavior
- Effective against discrete, bounded attack types
- Cannot detect accumulated trajectory divergence
- Cannot detect attacks where each step is individually permitted
- Tracks the trajectory of the agent’s working objective over time
- Stateful — maintains running representation across turns
- Fires on divergence from authorized objective, not from permitted behavior
- Targets goal hijacking, objective drifting, intent accumulation
- Requires baseline of what the authorized goal actually is
- Does not yet exist at production scale in enterprise deployments
The MI9 framework (arXiv:2508.03858) introduces what it calls a goal-conditioned drift indicator — a mechanism that distinguishes intentional adaptation from suspicious behavioral change by comparing against a goal-aware baseline. The key term is goal-aware. Standard anomaly detection compares current behavior against historical behavior. Goal-conditioned drift detection compares current behavior against what the agent should be doing given its authorized objective. These are different baselines. An agent that starts doing something new is anomalous under the first baseline. An agent that starts pursuing a different objective is anomalous under the second. Only the second catches goal hijacking.
A drift indicator that distinguishes intentional adaptation from suspicious behavioral change by comparing agent behavior against a goal-aware baseline — not against historical behavior, but against what the agent should be doing given its authorized objective. Evaluated over 1,000 synthetic multi-domain scenarios with high detection rates and low false positive rates. The prerequisite is a formal representation of the authorized goal that is available at runtime for comparison.
Both DeepContext and MI9 point to the same prerequisite: the authorized objective must be formally represented and available at runtime. You cannot detect divergence from a goal you have not specified. This is the step that enterprise deployments almost universally skip. Goals are stated informally in system prompts. They are not formally encoded in a representation that a runtime monitor can use as a comparison baseline. Building that representation — at the level of precision required to support goal-conditioned detection — is a non-trivial engineering and architectural problem that the field has not yet solved at production scale.
What Formally Representing the Authorized Objective Requires
The problem is more precisely scoped than it appears. It is not sufficient to say that the agent was initialized with a particular system prompt and that any divergence from the behavior implied by that prompt constitutes a violation. System prompts are natural language. Natural language does not provide the formal specification needed to support runtime comparison. The agent’s interpretation of its system prompt is itself probabilistic — the same prompt will produce different behaviors in different environments, at different points in a long-horizon task, in response to different environmental observations.
What goal-conditioned detection actually requires is a formal representation of the authorized objective that is: first, precise enough to support comparison against observed behavior; second, environment-aware enough to distinguish legitimate adaptation from goal drift; and third, efficient enough to evaluate at the speed of agent execution without becoming the bottleneck in a production deployment. None of these requirements are currently met by any production system at enterprise scale.
Detecting goal-oriented attacks against agentic AI systems requires stateful trajectory modeling against a formal representation of the authorized objective, evaluated continuously across the full execution lifecycle. The trajectory modeling exists in research form (DeepContext, MI9) but has not been integrated into general production agent architectures. The formal objective representation does not exist at the level of precision the problem requires. Both gaps are active research frontiers — and closing them is the prerequisite for structural detection capability at the goal layer.
This is where the series argument closes. The twelve series built here have made a consistent structural claim: that the failures enterprises are encountering are protocol-level architectural problems that governance and operational controls can mitigate but cannot structurally resolve. That claim holds for detection exactly as it holds for every other layer. Governance and operational monitoring can partially address goal-layer threats. They cannot structurally resolve them without the formal objective representation and stateful trajectory modeling that production-grade intent verification requires. Those do not yet exist at the scale and integration depth that enterprise agentic AI deployments require.
That is not a counsel of despair. It is a correct scoping of the problem. The field that correctly understands what it is missing is in a better position to build it than the field that believes boundary monitoring has already solved the goal-layer problem. The threat surface of a goal-oriented agent will not yield to the security architecture designed for a scripted one. Building the architecture that addresses it requires starting from an honest accounting of what the problem actually is.
The goal-oriented shift created a threat surface that is not enumerable at design time, not addressable by the defenses built for scripted agents, not structurally resolved by the best available platform architecture, and not detectable by current boundary-monitoring instrumentation. Research is pointing toward the right instruments — stateful trajectory modeling, goal-conditioned drift detection, formal objective representation. The gap between research demonstration and production-grade deployment is the frontier. Naming it precisely is where the argument ends and the building begins.
