What Detection Actually Requires

Three posts have built the case. Post 1 established that the goal-oriented threat surface is generated at runtime — not enumerable at design time, not amenable to pattern-matching controls. Post 2 showed that goal hijacking is a categorically distinct attack class from prompt injection, requiring defense at the objective boundary rather than the content boundary. Post 3 established that the best available platform architecture delivers structural controls at the protocol layer but cannot reach the goal layer below it. This post asks what reaching the goal layer actually requires — and names, precisely, what the field does not yet have.

The monitoring infrastructure enterprises have built for agentic AI is, by and large, boundary monitoring. Detect when an unauthorized tool is called. Detect when a request crosses a permission boundary. Detect when content entering the context window matches a known-harmful pattern. Detect when data egresses through an unapproved channel. These are all correct controls for the threats they were designed to catch. The ISACA framing states the problem plainly: when an agent acts, it uses real credentials and approved interfaces. When it accesses sensitive data, the request is treated as valid. When it executes commands, it does so within its granted permissions. There is no clear distinction between normal and malicious behavior at the agentic control layer.

That last sentence is the core of the detection problem. Traditional security monitoring detects deviation from a normal baseline. It detects the unauthorized call, the out-of-scope access, the known-bad signature. A goal-hijacked agent makes authorized calls, accesses in-scope data, and produces no signature. Each individual action is permissioned. The attack is in the trajectory — in the accumulated direction of a sequence of individually permitted actions toward an objective the agent was not authorized to pursue.

Luminity Observation

Detecting that is not a boundary problem. It is a trajectory problem. And the instrumentation for trajectory problems is not what the field has spent the past three years building.

Why Stateless Monitoring Cannot Address It

The safety gap in current monitoring architecture has a precise technical name. Safety guardrails remain largely stateless, treating multi-turn interactions as a series of disconnected events. Albrethsen, Datta, Kumar, and Rajasekar (arXiv:2602.16935) frame the failure mode directly: adversarial tactics like Crescendo and ActorAttack slowly bleed malicious intent across turn boundaries to bypass stateless filters. The attack is not contained in any individual turn — it accumulates across the conversation as the agent’s working understanding of its objective is gradually redirected. A filter applied to each turn independently sees nothing suspicious. The attack is in the relationship between turns, not in any single turn.

0.84

F1 Score — DeepContext stateful intent tracking (arXiv:2602.16935)

Compared to 0.67 for both Llama-Prompt-Guard-2 and Granite-Guardian. DeepContext uses a recurrent architecture to propagate a hidden state across the conversation, capturing the incremental accumulation of risk that stateless models overlook. Sub-20ms inference on a T4 GPU. This is the direction — modeling the sequential evolution of intent rather than scoring isolated events.

DeepContext demonstrates what stateful intent tracking looks like in practice: a recurrent neural network that ingests a sequence of turn-level embeddings and propagates a hidden state across the conversation, maintaining a running representation of the trajectory of intent rather than evaluating each turn in isolation. It monitors what the paper calls the “intent distance” — the accumulated divergence between the agent’s current working objective and the objective it was initialized with. When that distance crosses a threshold, the alert fires. Not because any single turn was suspicious, but because the trajectory is wrong.

This is the architectural principle that the goal-oriented detection problem requires: not event scoring, but trajectory modeling. The two problems require different instrumentation, different data structures, and different inference architectures. Building better event-scoring systems does not produce trajectory modeling capability.

The Pattern-Matching / Intent-Verification Distinction

The distinction between what current monitoring does and what goal-layer detection requires can be stated precisely as a contrast between two detection paradigms.

Pattern-Matching Detection

Scores individual events against known-bad signatures
Stateless — each event evaluated independently
Fires on deviation from permitted behavior
Effective against discrete, bounded attack types
Cannot detect accumulated trajectory divergence
Cannot detect attacks where each step is individually permitted

Intent-Verification Detection

Tracks the trajectory of the agent’s working objective over time
Stateful — maintains running representation across turns
Fires on divergence from authorized objective, not from permitted behavior
Targets goal hijacking, objective drifting, intent accumulation
Requires baseline of what the authorized goal actually is
Does not yet exist at production scale in enterprise deployments

The MI9 framework (arXiv:2508.03858) introduces what it calls a goal-conditioned drift indicator — a mechanism that distinguishes intentional adaptation from suspicious behavioral change by comparing against a goal-aware baseline. The key term is goal-aware. Standard anomaly detection compares current behavior against historical behavior. Goal-conditioned drift detection compares current behavior against what the agent should be doing given its authorized objective. These are different baselines. An agent that starts doing something new is anomalous under the first baseline. An agent that starts pursuing a different objective is anomalous under the second. Only the second catches goal hijacking.

Goal-Conditioned Drift Detection — MI9 (arXiv:2508.03858)

A drift indicator that distinguishes intentional adaptation from suspicious behavioral change by comparing agent behavior against a goal-aware baseline — not against historical behavior, but against what the agent should be doing given its authorized objective. Evaluated over 1,000 synthetic multi-domain scenarios with high detection rates and low false positive rates. The prerequisite is a formal representation of the authorized goal that is available at runtime for comparison.

Both DeepContext and MI9 point to the same prerequisite: the authorized objective must be formally represented and available at runtime. You cannot detect divergence from a goal you have not specified. This is the step that enterprise deployments almost universally skip. Goals are stated informally in system prompts. They are not formally encoded in a representation that a runtime monitor can use as a comparison baseline. Building that representation — at the level of precision required to support goal-conditioned detection — is a non-trivial engineering and architectural problem that the field has not yet solved at production scale.

What Formally Representing the Authorized Objective Requires

The problem is more precisely scoped than it appears. It is not sufficient to say that the agent was initialized with a particular system prompt and that any divergence from the behavior implied by that prompt constitutes a violation. System prompts are natural language. Natural language does not provide the formal specification needed to support runtime comparison. The agent’s interpretation of its system prompt is itself probabilistic — the same prompt will produce different behaviors in different environments, at different points in a long-horizon task, in response to different environmental observations.

What goal-conditioned detection actually requires is a formal representation of the authorized objective that is: first, precise enough to support comparison against observed behavior; second, environment-aware enough to distinguish legitimate adaptation from goal drift; and third, efficient enough to evaluate at the speed of agent execution without becoming the bottleneck in a production deployment. None of these requirements are currently met by any production system at enterprise scale.

The Open Problem, Named Precisely

Detecting goal-oriented attacks against agentic AI systems requires stateful trajectory modeling against a formal representation of the authorized objective, evaluated continuously across the full execution lifecycle. The trajectory modeling exists in research form (DeepContext, MI9) but has not been integrated into general production agent architectures. The formal objective representation does not exist at the level of precision the problem requires. Both gaps are active research frontiers — and closing them is the prerequisite for structural detection capability at the goal layer.

This is where the series argument closes. The twelve series built here have made a consistent structural claim: that the failures enterprises are encountering are protocol-level architectural problems that governance and operational controls can mitigate but cannot structurally resolve. That claim holds for detection exactly as it holds for every other layer. Governance and operational monitoring can partially address goal-layer threats. They cannot structurally resolve them without the formal objective representation and stateful trajectory modeling that production-grade intent verification requires. Those do not yet exist at the scale and integration depth that enterprise agentic AI deployments require.

That is not a counsel of despair. It is a correct scoping of the problem. The field that correctly understands what it is missing is in a better position to build it than the field that believes boundary monitoring has already solved the goal-layer problem. The threat surface of a goal-oriented agent will not yield to the security architecture designed for a scripted one. Building the architecture that addresses it requires starting from an honest accounting of what the problem actually is.

The Series Conclusion

The goal-oriented shift created a threat surface that is not enumerable at design time, not addressable by the defenses built for scripted agents, not structurally resolved by the best available platform architecture, and not detectable by current boundary-monitoring instrumentation. Research is pointing toward the right instruments — stateful trajectory modeling, goal-conditioned drift detection, formal objective representation. The gap between research demonstration and production-grade deployment is the frontier. Naming it precisely is where the argument ends and the building begins.

Series 12 · The Goal-Oriented Shift

Post 01 · Published The Script Is Gone

Post 02 · Published Goal Hijacking Is Not Prompt Injection

Post 03 · Published The Platform Bet

Post 04 · Now Reading What Detection Actually Requires

→ Stateful Trajectory ModelingMonitoring the sequential evolution of an agent’s intent over time by propagating a hidden state across turns — not scoring isolated events. Required for detecting goal hijacking and objective drift. DeepContext demonstrates the approach at F1 0.84, sub-20ms.
→ Goal-Conditioned Drift DetectionComparing agent behavior against what the agent should be doing given its authorized objective — not against historical baseline behavior. Catches divergence from authorized goals, not merely deviation from prior behavior.
→ Formal Objective RepresentationA machine-readable specification of the authorized goal, precise enough to support runtime comparison against observed behavior. The prerequisite for goal-conditioned detection. Not currently available in production form at enterprise scale.
→ The Safety GapThe detection failure created by stateless monitoring: when adversarial influence accumulates gradually across turns, no individual turn triggers a stateless filter. The attack is in the relationship between events, not in any single event.

Series 2 — Building Defensible Agents Why Probabilistic Defenses Keep Failing 3 posts · luminitydigital.com
Series 5 — The Policy Layer From Architectural Enforcement to Operational Reality 4 posts · luminitydigital.com
Series 1 — Where Agentic AI Breaks Why Safety Alignment Fails at the Tool-Call Layer 5 posts · luminitydigital.com

Why Stateless Monitoring Cannot Address It

The Pattern-Matching / Intent-Verification Distinction

What Formally Representing the Authorized Objective Requires

If this series raised questions worth exploring for your deployment

Like this:

Related

What Detection Actually Requires

Why Stateless Monitoring Cannot Address It

The Pattern-Matching / Intent-Verification Distinction

What Formally Representing the Authorized Objective Requires

If this series raised questions worth exploring for your deployment

Share this:

Like this:

Related