Agentic AI & Unstructured Data: RBAC Security Breakdown — Luminity Digital
Enterprise AI Security · Access Control Series

Unstructured Data and the RBAC Breakdown in Agentic AI Systems

Traditional Role-Based Access Control assumes discrete, classifiable, statically-owned objects. Agentic AI systems dynamically retrieve, synthesize, transform, and generate unstructured data across trust boundaries — collapsing the foundational assumptions that RBAC was engineered to enforce.

February 2026
6 Failure Modes
4 Maturity Levels
12 Min Read
OWASP LLM Top 10 2025 NIST AI RMF AI 100-1 NIST SP 800-53 Rev 5 NIST AI 100-2e2023

RBAC operates on a model of discrete, identifiable, persistently-owned objects — where a subject’s role grants or denies access at a known boundary. Agentic AI systems violate every one of these assumptions simultaneously, creating a fundamental architectural incompatibility between established security frameworks and AI agent runtime behavior.

Agentic AI systems retrieve unstructured content dynamically via vector search (which has no concept of access boundaries), synthesize information across multiple source documents, generate new unstructured artifacts that derive from potentially-restricted inputs, and pass context between sub-agents with ambiguous permission inheritance. The result is that no access control decision can be made at a single, discrete boundary — yet that is precisely what traditional RBAC requires.

5

NIST SP 800-53 Rev 5 access control family requirements directly broken by standard agentic AI unstructured data pipelines — AC-3, AC-6, AC-16, AU-10, and SI-10.

NIST AI RMF (GOVERN 1.1, MAP 1.5) acknowledges that AI systems introduce new risk surfaces that existing control frameworks were not designed to address. The intersection of unstructured data retrieval and agentic execution is one of the most concrete manifestations of that gap.

Structural Divergence: Structured vs. Unstructured Data

The properties that make unstructured data valuable to agents are precisely the properties that make RBAC enforcement unreliable.

RBAC Works

Structured Data

Discrete Object Boundaries Database rows and files have clear ownership metadata. AC-3 (Access Enforcement) can operate deterministically.
Persistent Classification Labels Sensitivity labels persist with the object. RBAC policies can be checked at read time against a stable classification.
Atomic Access Decisions A SELECT on a restricted table either succeeds or fails. No partial exposure. No semantic leakage.
Auditable Access Log Every access event is a discrete, attributable transaction. AC-6 and AU-2 can enforce and verify.
No Transformation at Read Data is returned as stored. The consumer receives exactly what the RBAC check permitted — no more.
RBAC Fails

Unstructured Data + Agents

Semantic Boundaries Are Invisible to RBAC A PDF may contain mixed-sensitivity content. A vector chunk may span a classification boundary. There is no discrete object the policy engine can evaluate.
Classification Is Lost After Chunking RAG pipelines split documents into vector chunks that lose provenance metadata. The chunk’s inherited classification is typically not preserved in the embedding store.
Synthesis Creates New, Unclassified Artifacts Agent outputs are newly generated content derived from multiple sources. The resulting text has no RBAC lineage — it cannot be automatically classified or controlled.
Aggregation Violates Least Privilege Individual document reads may each be within policy, but an agent synthesizing a summary across all accessible documents violates the spirit of least privilege (NIST SP 800-53 AC-6).
Agent-in-the-Middle Breaks Audit Trails The agent is the authorized accessor, not the end user. Attribution of what data the user effectively received requires agent-level observability that most deployments lack.

OWASP LLM Top 10 (2025) — Directly Implicated

LLM01 : 2025
Prompt Injection
Malicious instructions embedded in unstructured documents retrieved by agents can override access controls and redirect agent behavior — a direct RBAC bypass via data plane manipulation.
LLM02 : 2025
Sensitive Information Disclosure
Agents synthesize and return content derived from restricted unstructured sources. Even when individual document access is permitted, the synthesized output may expose restricted information to unauthorized users.
LLM06 : 2025
Excessive Agency
Agents granted broad retrieval permissions operate far beyond what a user would be permitted to access directly. The agent’s permissions become a privilege escalation vector if not scoped to the requesting user’s effective rights.
LLM08 : 2025
Vector and Embedding Weaknesses
Vector stores lack native access control enforcement. Similarity search operates across all embeddings regardless of source classification — a fundamental architectural gap.

Six Critical RBAC Failure Modes

Each represents a scenario where a technically-compliant RBAC check fails to prevent unauthorized information exposure.

Failure Mode 01
Critical

RAG Retrieval Boundary Violation

Vector similarity search retrieves semantically-related chunks without evaluating source document access rights at query time. An embedding generated from a classified document and one from a public document may be semantically indistinguishable — both surface in retrieval results. If the vector store was populated without per-chunk ACL metadata, there is no enforcement point.
OWASP LLM08 NIST AC-3 NIST AI 100-2
Technical Root Cause Embedding models are trained to capture semantic similarity, not security classification. The vector space has no concept of a trust boundary. RBAC filtering must be applied as a post-retrieval metadata filter — but this only works if source ACL metadata was preserved during the chunking and ingestion pipeline.
Failure Mode 02
Critical

Indirect Prompt Injection via Retrieved Content

Malicious instructions embedded in unstructured content retrieved by the agent are interpreted as legitimate instructions. An attacker with write access to any document in the retrieval corpus can inject instructions that cause the agent to bypass access controls, exfiltrate data, or perform unauthorized actions on behalf of the user.
OWASP LLM01 NIST SI-10 NIST AI RMF MAP 2.2
Technical Root Cause LLMs do not natively distinguish between trusted system instructions and untrusted retrieved content. The model processes all tokens in the context window with equivalent weight unless architectural separations are enforced. NIST AI 100-2e2023 §3.1 classifies this as an adversarial example attack on the model’s input processing.
Failure Mode 03
High

The Information Aggregation Problem

Each individual document retrieval may be within the user’s permitted access scope. However, an agent instructed to synthesize a comprehensive summary across all accessible documents produces an output that effectively aggregates information the user could never have assembled under normal operational constraints.
OWASP LLM02 NIST AC-6 NIST SP 800-188
Technical Root Cause NIST SP 800-188 formally defines the aggregation problem: combining individually non-sensitive data elements can produce sensitive inferences. RBAC has no native mechanism to evaluate the cumulative sensitivity of an agent’s context window — only the sensitivity of each discrete retrieval event.
Failure Mode 04
High

Agent Delegation & Permission Inheritance Ambiguity

In multi-agent architectures (orchestrator → sub-agent delegation), there is no standardized mechanism for propagating user-level RBAC context through agent chains. Sub-agents typically operate under service account permissions rather than the end user’s effective rights, creating privilege escalation paths.
OWASP LLM06 NIST AC-2 RFC 6749 NIST SP 800-207
Technical Root Cause OAuth 2.0 token delegation (RFC 8693) and Zero Trust patterns (NIST SP 800-207) provide the theoretical framework for propagating user context — but agent frameworks have not standardized on these patterns. The A2A Protocol (Google, 2025) begins to address agent identity, but RBAC context propagation remains implementation-specific.
Failure Mode 05
High

Temporal Access Drift in Vector Stores

Access control on source documents is time-variant — users gain and lose permissions, documents are reclassified, employees are offboarded. Vector store embeddings are point-in-time snapshots. If a document’s access controls change after its embedding is ingested, the embedding remains retrievable, bypassing updated source document controls entirely.
NIST AC-2(j) NIST IA-4 OWASP LLM08
Technical Root Cause NIST AC-2(j) requires review of accounts for compliance with account management requirements. Extending this to vector store embeddings requires treating each embedded chunk as an access-controlled resource subject to lifecycle management — a concept absent from current vector database architectures.
Failure Mode 06
High

Generated Output Classification Void

Agent-generated responses derived from restricted source documents carry no inherent classification. A synthesis of ten restricted documents produces an output with no security metadata, no access control policy, and no lineage to the contributing sources. This artifact may be stored, shared, or forwarded without any RBAC enforcement downstream.
NIST AC-16 NIST AU-10 OWASP LLM02
Technical Root Cause Data lineage tracking in AI-generated content is an unsolved problem at scale. NIST AU-10 (Non-repudiation) requires that outputs be traceable to their originating actions, but no current vector database or LLM orchestration framework provides automated classification inheritance for generated outputs derived from multi-source retrievals.

RAG Pipeline: RBAC Enforcement Gaps by Stage

Access control failures are not concentrated at a single point — they occur at every stage of the unstructured data retrieval pipeline.

→ Data Flow & Access Control Gap Analysis
Stage 1
Document Ingestion
ACL metadata typically not captured during chunking. Source IAM policy is decoupled from embedding pipeline.
Stage 2
Chunking & Embedding
Chunks may span classification boundaries. Embedding captures semantics, not security posture. Parent document ACL is not inherited.
Stage 3
Vector Store Indexing
Embeddings stored without user-scoped ACL metadata unless explicitly engineered. No native RBAC enforcement in most vector DBs.
Stage 4
Similarity Retrieval
ANN search returns semantically similar chunks regardless of source classification. User context not evaluated at search time without post-retrieval filtering.
Stage 5
Context Assembly
Agent accumulates chunks from multiple sources. Aggregation problem emerges. No RBAC check on cumulative context sensitivity.
Stage 6
Generation & Output
Generated artifact has no classification lineage. Output may be stored, shared, or transmitted without any access control enforcement downstream.

NIST SP 800-53 Rev 5 — How Agents Break Each Control

Direct mapping of established access control requirements to agent-specific failure modes with unstructured data.

Control
Standard Intent
Agent Failure Mode
AC-3: Access Enforcement NIST SP 800-53 AC-3
Enforce approved authorizations at access control decision points.
No discrete decision point in vector similarity search. Retrieval bypasses enforcement.
AC-6: Least Privilege NIST SP 800-53 AC-6
Grant only minimum access required for legitimate purpose.
Agents typically operate with broad service account permissions. Aggregation violates least privilege even within permitted scopes.
AC-16: Security Attributes NIST SP 800-53 AC-16
Associate security attributes with information and systems.
Chunking destroys security attribute inheritance. Generated outputs have no attribute lineage.
AU-10: Non-Repudiation NIST SP 800-53 AU-10
Provide irrefutable evidence of access and information origin.
Agent operates as intermediary. User-level attribution requires agent-level audit logging not standard in current frameworks.
SI-10: Input Validation NIST SP 800-53 SI-10
Check validity of information inputs to systems.
Agents pass retrieved unstructured content directly to LLM context. No native validation distinguishes legitimate content from injected instructions.

Recommended Compensating Controls

Because standard RBAC cannot be directly applied to agent unstructured data pipelines, compensating controls must be engineered at each architectural layer.

🔏

Per-Chunk ACL Metadata Enforcement

Preserve source document ACL metadata at chunk ingestion time. Apply ACL-based post-retrieval filtering before chunks are assembled into agent context. Treat ACL drift as a first-class pipeline event requiring re-ingestion.

NIST AC-3 OWASP LLM08
🪪

User-Context Token Propagation

Implement OAuth 2.0 Token Exchange (RFC 8693) to propagate end-user identity and effective permissions through agent delegation chains. Agents should never access resources under service account identity when acting on behalf of a user.

RFC 8693 NIST SP 800-207 NIST AC-2
🧱

Content Isolation & Injection Defense

Architecturally separate system instructions from retrieved content in the agent context window. Apply input validation to retrieved documents before LLM processing. Use structured output schemas to constrain agent behavior.

OWASP LLM01 NIST SI-10
📊

Context Window Aggregation Limits

Implement maximum retrieval volume policies that prevent agents from assembling context windows that would effectively grant access to the entirety of a user’s permitted corpus. Enforce purpose-bound retrieval scoping.

NIST AC-6 SP 800-188
🏷️

Output Classification Inheritance

Automatically classify generated outputs at the highest sensitivity level of any contributing source document. Apply access controls to generated artifacts before storage or transmission. Log generation provenance for AU-10 compliance.

NIST AC-16 NIST AU-10 OWASP LLM02
🔭

Agent-Layer Observability & Audit

Instrument agent runtime to log every retrieval event with source document identity, user context, and effective ACL state. Implement real-time anomaly detection on retrieval patterns that suggest aggregation attacks or injection sequences.

NIST AU-2 AI RMF GOVERN 4.2

RBAC Maturity Model for Agentic Unstructured Data Systems

A four-level progression from baseline compliance to semantically-aware dynamic access control.

Level 1 — Baseline
Service Account Scoping Only
  • Agent operates under a service account with scoped permissions to specific repositories
  • No user-context propagation through the agent
  • Retrieval is unrestricted within the service account scope
  • Minimum viable for internal tooling with low-sensitivity corpora
Level 2 — Structured
Post-Retrieval ACL Filtering
  • ACL metadata preserved at chunk ingestion
  • Post-retrieval filtering applied before context assembly
  • User identity passed to retrieval layer for filtering
  • Audit logs capture retrieval events with user attribution
Level 3 — Advanced
Token Delegation + Output Classification
  • OAuth 2.0 token exchange for user context propagation through agent chains
  • Generated outputs classified at highest source sensitivity
  • Injection detection on retrieved content
  • Aggregation volume limits enforced per session
Level 4 — Adaptive
Semantic Access Control + Continuous Evaluation
  • Real-time semantic classification of retrieved chunks
  • Dynamic policy evaluation against cumulative context sensitivity
  • Behavioral anomaly detection on retrieval patterns
  • Continuous re-authorization throughout agent session lifecycle
Key Insight

The fundamental challenge is not model intelligence but system reliability. Standard RBAC was never designed for systems that dynamically retrieve, synthesize, and generate across trust boundaries. Compensating controls must be architected at every stage of the agent pipeline — from ingestion through generation — because no single enforcement point can address the full scope of RBAC breakdown in agentic systems.

Related Resources

For agent runtime architecture guidance, see our Agent Harness Architecture series. For governance framework guidance, see AI Agent Lifecycle Management.

Technical References & Standards

Share this: