RBAC operates on a model of discrete, identifiable, persistently-owned objects — where a subject’s role grants or denies access at a known boundary. Agentic AI systems violate every one of these assumptions simultaneously, creating a fundamental architectural incompatibility between established security frameworks and AI agent runtime behavior.
Agentic AI systems retrieve unstructured content dynamically via vector search (which has no concept of access boundaries), synthesize information across multiple source documents, generate new unstructured artifacts that derive from potentially-restricted inputs, and pass context between sub-agents with ambiguous permission inheritance. The result is that no access control decision can be made at a single, discrete boundary — yet that is precisely what traditional RBAC requires.
5
NIST SP 800-53 Rev 5 access control family requirements directly broken by standard agentic AI unstructured data pipelines — AC-3, AC-6, AC-16, AU-10, and SI-10.
NIST AI RMF (GOVERN 1.1, MAP 1.5) acknowledges that AI systems introduce new risk surfaces that existing control frameworks were not designed to address. The intersection of unstructured data retrieval and agentic execution is one of the most concrete manifestations of that gap.
Structural Divergence: Structured vs. Unstructured Data
The properties that make unstructured data valuable to agents are precisely the properties that make RBAC enforcement unreliable.
Discrete Object Boundaries
Database rows and files have clear ownership metadata. AC-3 (Access Enforcement) can operate deterministically.
Persistent Classification Labels
Sensitivity labels persist with the object. RBAC policies can be checked at read time against a stable classification.
Atomic Access Decisions
A SELECT on a restricted table either succeeds or fails. No partial exposure. No semantic leakage.
Auditable Access Log
Every access event is a discrete, attributable transaction. AC-6 and AU-2 can enforce and verify.
No Transformation at Read
Data is returned as stored. The consumer receives exactly what the RBAC check permitted — no more.
Semantic Boundaries Are Invisible to RBAC
A PDF may contain mixed-sensitivity content. A vector chunk may span a classification boundary. There is no discrete object the policy engine can evaluate.
Classification Is Lost After Chunking
RAG pipelines split documents into vector chunks that lose provenance metadata. The chunk’s inherited classification is typically not preserved in the embedding store.
Synthesis Creates New, Unclassified Artifacts
Agent outputs are newly generated content derived from multiple sources. The resulting text has no RBAC lineage — it cannot be automatically classified or controlled.
Aggregation Violates Least Privilege
Individual document reads may each be within policy, but an agent synthesizing a summary across all accessible documents violates the spirit of least privilege (NIST SP 800-53 AC-6).
Agent-in-the-Middle Breaks Audit Trails
The agent is the authorized accessor, not the end user. Attribution of what data the user effectively received requires agent-level observability that most deployments lack.
OWASP LLM Top 10 (2025) — Directly Implicated
LLM01 : 2025
Prompt Injection
Malicious instructions embedded in unstructured documents retrieved by agents can override access controls and redirect agent behavior — a direct RBAC bypass via data plane manipulation.
LLM02 : 2025
Sensitive Information Disclosure
Agents synthesize and return content derived from restricted unstructured sources. Even when individual document access is permitted, the synthesized output may expose restricted information to unauthorized users.
LLM06 : 2025
Excessive Agency
Agents granted broad retrieval permissions operate far beyond what a user would be permitted to access directly. The agent’s permissions become a privilege escalation vector if not scoped to the requesting user’s effective rights.
LLM08 : 2025
Vector and Embedding Weaknesses
Vector stores lack native access control enforcement. Similarity search operates across all embeddings regardless of source classification — a fundamental architectural gap.
Six Critical RBAC Failure Modes
Each represents a scenario where a technically-compliant RBAC check fails to prevent unauthorized information exposure.
Failure Mode 01
Critical
RAG Retrieval Boundary Violation
Vector similarity search retrieves semantically-related chunks without evaluating source document access rights at query time. An embedding generated from a classified document and one from a public document may be semantically indistinguishable — both surface in retrieval results. If the vector store was populated without per-chunk ACL metadata, there is no enforcement point.
OWASP LLM08
NIST AC-3
NIST AI 100-2
Technical Root Cause
Embedding models are trained to capture semantic similarity, not security classification. The vector space has no concept of a trust boundary. RBAC filtering must be applied as a post-retrieval metadata filter — but this only works if source ACL metadata was preserved during the chunking and ingestion pipeline.
Failure Mode 02
Critical
Indirect Prompt Injection via Retrieved Content
Malicious instructions embedded in unstructured content retrieved by the agent are interpreted as legitimate instructions. An attacker with write access to any document in the retrieval corpus can inject instructions that cause the agent to bypass access controls, exfiltrate data, or perform unauthorized actions on behalf of the user.
OWASP LLM01
NIST SI-10
NIST AI RMF MAP 2.2
Technical Root Cause
LLMs do not natively distinguish between trusted system instructions and untrusted retrieved content. The model processes all tokens in the context window with equivalent weight unless architectural separations are enforced. NIST AI 100-2e2023 §3.1 classifies this as an adversarial example attack on the model’s input processing.
Failure Mode 03
High
The Information Aggregation Problem
Each individual document retrieval may be within the user’s permitted access scope. However, an agent instructed to synthesize a comprehensive summary across all accessible documents produces an output that effectively aggregates information the user could never have assembled under normal operational constraints.
OWASP LLM02
NIST AC-6
NIST SP 800-188
Technical Root Cause
NIST SP 800-188 formally defines the aggregation problem: combining individually non-sensitive data elements can produce sensitive inferences. RBAC has no native mechanism to evaluate the cumulative sensitivity of an agent’s context window — only the sensitivity of each discrete retrieval event.
Failure Mode 04
High
Agent Delegation & Permission Inheritance Ambiguity
In multi-agent architectures (orchestrator → sub-agent delegation), there is no standardized mechanism for propagating user-level RBAC context through agent chains. Sub-agents typically operate under service account permissions rather than the end user’s effective rights, creating privilege escalation paths.
OWASP LLM06
NIST AC-2
RFC 6749
NIST SP 800-207
Technical Root Cause
OAuth 2.0 token delegation (RFC 8693) and Zero Trust patterns (NIST SP 800-207) provide the theoretical framework for propagating user context — but agent frameworks have not standardized on these patterns. The A2A Protocol (Google, 2025) begins to address agent identity, but RBAC context propagation remains implementation-specific.
Failure Mode 05
High
Temporal Access Drift in Vector Stores
Access control on source documents is time-variant — users gain and lose permissions, documents are reclassified, employees are offboarded. Vector store embeddings are point-in-time snapshots. If a document’s access controls change after its embedding is ingested, the embedding remains retrievable, bypassing updated source document controls entirely.
NIST AC-2(j)
NIST IA-4
OWASP LLM08
Technical Root Cause
NIST AC-2(j) requires review of accounts for compliance with account management requirements. Extending this to vector store embeddings requires treating each embedded chunk as an access-controlled resource subject to lifecycle management — a concept absent from current vector database architectures.
Failure Mode 06
High
Generated Output Classification Void
Agent-generated responses derived from restricted source documents carry no inherent classification. A synthesis of ten restricted documents produces an output with no security metadata, no access control policy, and no lineage to the contributing sources. This artifact may be stored, shared, or forwarded without any RBAC enforcement downstream.
NIST AC-16
NIST AU-10
OWASP LLM02
Technical Root Cause
Data lineage tracking in AI-generated content is an unsolved problem at scale. NIST AU-10 (Non-repudiation) requires that outputs be traceable to their originating actions, but no current vector database or LLM orchestration framework provides automated classification inheritance for generated outputs derived from multi-source retrievals.
RAG Pipeline: RBAC Enforcement Gaps by Stage
Access control failures are not concentrated at a single point — they occur at every stage of the unstructured data retrieval pipeline.
→ Data Flow & Access Control Gap Analysis
Stage 1
Document Ingestion
ACL metadata typically not captured during chunking. Source IAM policy is decoupled from embedding pipeline.
→
Stage 2
Chunking & Embedding
Chunks may span classification boundaries. Embedding captures semantics, not security posture. Parent document ACL is not inherited.
→
Stage 3
Vector Store Indexing
Embeddings stored without user-scoped ACL metadata unless explicitly engineered. No native RBAC enforcement in most vector DBs.
Stage 4
Similarity Retrieval
ANN search returns semantically similar chunks regardless of source classification. User context not evaluated at search time without post-retrieval filtering.
→
Stage 5
Context Assembly
Agent accumulates chunks from multiple sources. Aggregation problem emerges. No RBAC check on cumulative context sensitivity.
→
Stage 6
Generation & Output
Generated artifact has no classification lineage. Output may be stored, shared, or transmitted without any access control enforcement downstream.
NIST SP 800-53 Rev 5 — How Agents Break Each Control
Direct mapping of established access control requirements to agent-specific failure modes with unstructured data.
AC-3: Access Enforcement
NIST SP 800-53 AC-3
Enforce approved authorizations at access control decision points.
No discrete decision point in vector similarity search. Retrieval bypasses enforcement.
AC-6: Least Privilege
NIST SP 800-53 AC-6
Grant only minimum access required for legitimate purpose.
Agents typically operate with broad service account permissions. Aggregation violates least privilege even within permitted scopes.
AC-16: Security Attributes
NIST SP 800-53 AC-16
Associate security attributes with information and systems.
Chunking destroys security attribute inheritance. Generated outputs have no attribute lineage.
AU-10: Non-Repudiation
NIST SP 800-53 AU-10
Provide irrefutable evidence of access and information origin.
Agent operates as intermediary. User-level attribution requires agent-level audit logging not standard in current frameworks.
SI-10: Input Validation
NIST SP 800-53 SI-10
Check validity of information inputs to systems.
Agents pass retrieved unstructured content directly to LLM context. No native validation distinguishes legitimate content from injected instructions.
Recommended Compensating Controls
Because standard RBAC cannot be directly applied to agent unstructured data pipelines, compensating controls must be engineered at each architectural layer.
🔏
Per-Chunk ACL Metadata Enforcement
Preserve source document ACL metadata at chunk ingestion time. Apply ACL-based post-retrieval filtering before chunks are assembled into agent context. Treat ACL drift as a first-class pipeline event requiring re-ingestion.
NIST AC-3
OWASP LLM08
🪪
User-Context Token Propagation
Implement OAuth 2.0 Token Exchange (RFC 8693) to propagate end-user identity and effective permissions through agent delegation chains. Agents should never access resources under service account identity when acting on behalf of a user.
RFC 8693
NIST SP 800-207
NIST AC-2
🧱
Content Isolation & Injection Defense
Architecturally separate system instructions from retrieved content in the agent context window. Apply input validation to retrieved documents before LLM processing. Use structured output schemas to constrain agent behavior.
OWASP LLM01
NIST SI-10
📊
Context Window Aggregation Limits
Implement maximum retrieval volume policies that prevent agents from assembling context windows that would effectively grant access to the entirety of a user’s permitted corpus. Enforce purpose-bound retrieval scoping.
NIST AC-6
SP 800-188
🏷️
Output Classification Inheritance
Automatically classify generated outputs at the highest sensitivity level of any contributing source document. Apply access controls to generated artifacts before storage or transmission. Log generation provenance for AU-10 compliance.
NIST AC-16
NIST AU-10
OWASP LLM02
🔭
Agent-Layer Observability & Audit
Instrument agent runtime to log every retrieval event with source document identity, user context, and effective ACL state. Implement real-time anomaly detection on retrieval patterns that suggest aggregation attacks or injection sequences.
NIST AU-2
AI RMF GOVERN 4.2
RBAC Maturity Model for Agentic Unstructured Data Systems
A four-level progression from baseline compliance to semantically-aware dynamic access control.
Level 1 — Baseline
Service Account Scoping Only
- Agent operates under a service account with scoped permissions to specific repositories
- No user-context propagation through the agent
- Retrieval is unrestricted within the service account scope
- Minimum viable for internal tooling with low-sensitivity corpora
Level 2 — Structured
Post-Retrieval ACL Filtering
- ACL metadata preserved at chunk ingestion
- Post-retrieval filtering applied before context assembly
- User identity passed to retrieval layer for filtering
- Audit logs capture retrieval events with user attribution
Level 3 — Advanced
Token Delegation + Output Classification
- OAuth 2.0 token exchange for user context propagation through agent chains
- Generated outputs classified at highest source sensitivity
- Injection detection on retrieved content
- Aggregation volume limits enforced per session
Level 4 — Adaptive
Semantic Access Control + Continuous Evaluation
- Real-time semantic classification of retrieved chunks
- Dynamic policy evaluation against cumulative context sensitivity
- Behavioral anomaly detection on retrieval patterns
- Continuous re-authorization throughout agent session lifecycle
Key Insight
The fundamental challenge is not model intelligence but system reliability. Standard RBAC was never designed for systems that dynamically retrieve, synthesize, and generate across trust boundaries. Compensating controls must be architected at every stage of the agent pipeline — from ingestion through generation — because no single enforcement point can address the full scope of RBAC breakdown in agentic systems.
Related Resources
For agent runtime architecture guidance, see our Agent Harness Architecture series. For governance framework guidance, see AI Agent Lifecycle Management.