Provenance Is the Architecture

Post 1 established that DDIPE embeds malicious logic in skill documentation — bypassing invocation-layer defenses at 11–33% while explicit attacks are caught at 0%. Post 2 established that network-level behavioral analysis — not content scanning — is the detection approach that addresses stealthy supply chain attacks, with ShieldNet achieving 0.995 F1 against 10,000+ malicious MCP tools. This post maps the structural defense layer that both approaches assume but cannot themselves provide: cryptographic provenance. The anchor papers are RAGShield (arXiv:2604.00387, Patil, April 2026) and Semantic Chameleon (arXiv:2603.18034, Thornton, March 2026). The closing argument connects to Series 9: provenance without identity is incomplete.

Every defense documented in this series so far operates on an implicit assumption. Content scanning assumes that the artifact being scanned is the artifact that will be executed. Behavioral analysis assumes that the traffic being observed corresponds to the tool that was provisioned. Neither assumption is verified. Neither can be verified without a layer that has not yet been built into most enterprise agentic deployments: a cryptographically enforced chain of custody that makes every artifact in the agent’s execution environment attributable and verifiable from the point of review to the point of execution.

The gap this creates is specific. A tool can pass content scanning at review time and be silently replaced with a malicious variant before execution. A behavioral analysis system can observe traffic that appears normal because the malicious tool is behaving normally until a specific trigger condition is met. Both defenses address genuine attack vectors. Both can be circumvented by an attacker who controls the supply chain at the layer between review and execution — the layer where artifacts travel from their origin to the agent’s trust boundary without cryptographic verification of their integrity.

Provenance verification closes this gap by making the chain of custody cryptographically enforced. An artifact with a verifiable provenance chain cannot be silently substituted. Its integrity from origin to execution is mathematically attestable. An agent that retrieves only provenance-verified artifacts from its knowledge base and tool registry cannot be served a substituted payload regardless of what the supply chain looks like between those two points.

What RAGShield Demonstrates

RAGShield is a five-layer defense-in-depth framework for knowledge base supply chain attacks in government and enterprise RAG deployments. Its design is explicitly motivated by the observation that post-retrieval content filtering — the approach most organizations currently rely on — fails against sophisticated attacks in ways that are measurable and documentable.

RAGShield — Five-Layer Architecture

Layer 1 — C2PA cryptographic attestation: Every document entering the knowledge base carries a cryptographically verifiable provenance record — who created it, when, and through what chain of custody. Documents without valid attestation are rejected before ingestion.

Layer 2 — Trust-weighted retrieval: Retrieved documents are weighted by the trust score of their provenance chain. Highly attested documents from verified organizational sources receive higher retrieval weight than community-sourced documents with weak provenance.

Layer 3 — Formal taint lattice with contradiction detection: A formal information flow model tracks which documents contributed to each generated statement. Contradictions between provenance-strong and provenance-weak sources trigger escalation.

Layer 4 — Provenance-aware generation: The generation layer is aware of each retrieved document’s provenance tier. Output claims are labeled with their provenance confidence, and outputs drawing heavily from low-provenance sources are flagged for human review.

Layer 5 — NIST SP 800-53 compliance mapping: The full stack maps to NIST SP 800-53 control families, enabling enterprise compliance teams to verify that the provenance architecture satisfies existing control requirements without standalone justification.

Attack success rate across all five adversary tiers — including adaptive attackers with valid signing keys — when the full RAGShield stack is deployed. Post-retrieval content filtering alone achieved 7.5–12.5% attack success rate against sophisticated attacks. The gap between filtering alone and the provenance stack is the structural contribution of cryptographic attestation: an attacker with a valid signing key can still forge content, but the taint lattice and contradiction detection catch the semantic inconsistency.

The 7.5–12.5% failure rate of post-retrieval filtering alone is the number that should concern enterprise teams currently relying on content filtering as their primary RAG security control. Against sophisticated adversaries, post-retrieval filtering achieves 87.5–92.5% detection — which sounds good until the failure cases involve supply chain compromise of a knowledge base that feeds a financial, legal, or infrastructure decision-making agent.

The RAGShield finding also establishes something important about the adaptive attacker scenario. An adversary with a valid signing key — the hardest case, representing a compromised organizational identity — can forge C2PA attestations. The provenance layer alone does not close this. What closes it is the combination of provenance attestation and the taint lattice’s contradiction detection: even a validly-signed document that contradicts the established knowledge graph produces a detectable signal. Identity and provenance together are necessary. Neither alone is sufficient.

What Semantic Chameleon Adds: The Retrieval Architecture Defense

Thornton’s Semantic Chameleon research (arXiv:2603.18034) establishes a finding that complements RAGShield’s provenance stack from a different angle: the retrieval architecture itself is a structural defense variable, and one that is immediately deployable without new infrastructure.

38% → 0%

Gradient-guided corpus poisoning attack success rate against pure vector retrieval — 38% co-retrieval of malicious documents. The same attack against hybrid BM25 + vector retrieval: 0% attack success across all mixing weights tested, on a 67,941-document corpus. Hybrid retrieval is a zero-cost structural defense available today in every major RAG framework. Most enterprise deployments are not using it.

The mechanism is precise. Gradient-guided attacks optimize a malicious document’s embedding to be maximally similar to target query embeddings — ensuring the document is consistently retrieved alongside legitimate content. Hybrid retrieval combines semantic vector similarity with BM25 keyword scoring. The attack that optimizes for vector proximity cannot simultaneously optimize for keyword relevance — the two retrieval axes are sufficiently independent that maximizing one tends to suppress performance on the other. The attack’s efficacy depends on the retrieval system being purely vector-based.

This finding has an immediate practical implication: switching from pure vector retrieval to hybrid retrieval is a configuration change, not an infrastructure investment. Every major RAG framework — LangChain, LlamaIndex, and their enterprise derivatives — supports hybrid retrieval. The organizations currently running pure vector retrieval are operating with a retrieval architecture that is demonstrably more susceptible to gradient-guided supply chain attacks than the alternative already available to them.

Detection, Provenance, and the Gap Between Them

Posts 1, 2, and 3 of this series document three distinct attack surfaces with three distinct defense requirements. Mapping them precisely prevents the common failure mode of deploying one defense and concluding the surface is covered.

Detection-Based Defenses

What They Close

Content scanning (Post 1 context): Catches explicitly malicious skill declarations, known-bad signatures, and injection patterns visible in static analysis. Does not reach DDIPE implicit documentation attacks.

Network behavioral analysis (Post 2 context): Catches stealthy runtime exfiltration and behavioral patterns that declarations hide. Does not verify that the tool being observed is the tool that was reviewed.

Post-retrieval content filtering: Catches known-signature corpus poisoning. Fails against sophisticated attacks at 7.5–12.5% rate. Does not verify document provenance.

Detection-based defenses are necessary. They address real, documented attack vectors. Their structural limit is that they answer “did this behave badly?” — not “is this the artifact we approved?”

Necessary · Behavior-Focused

Provenance-Based Defenses

What They Close

Cryptographic attestation (C2PA): Verifies that every artifact entering the agent’s trust boundary has an unbroken, attributable chain of custody from its origin. Silent substitution of reviewed artifacts becomes cryptographically detectable.

Hybrid retrieval architecture: Eliminates the pure-vector retrieval attack surface that gradient-guided corpus poisoning requires. Zero-cost, immediately deployable. Drops gradient-guided attack success from 38% to 0%.

Taint lattice + contradiction detection: Catches adaptive attackers who have forged valid credentials — the case cryptographic attestation alone does not close. Semantic inconsistency between provenance-strong and provenance-weak sources is detectable even when the provenance chain is valid.

Provenance-based defenses answer “is this the artifact we approved?” — the question detection-based defenses structurally cannot.

Structural · Identity-Anchored

The right column identifies the reason this post is titled “Provenance Is the Architecture” rather than “Provenance Is a Feature.” Detection-based defenses are positioned as controls that operate on the artifact. Provenance-based defenses are positioned as the layer that determines whether the artifact is what the agent believes it to be. The structural relationship between them is not additive — provenance is the prerequisite condition that makes detection meaningful. A detection system operating on an unverified artifact is detecting behaviors in something it cannot confirm is the artifact it was asked to monitor.

The Enterprise Implementation Path

The full provenance architecture — C2PA attestation, trust-weighted retrieval, taint lattice, provenance-aware generation — is not a single-sprint deployment. It is a program of work with sequenced components, and the sequencing matters.

Immediate — No New Infrastructure

Switch to Hybrid Retrieval

Replace pure vector retrieval with BM25 + vector hybrid retrieval in all RAG deployments. This is a configuration change in existing frameworks. It drops gradient-guided corpus poisoning success from 38% to 0% at zero incremental cost. There is no valid reason to defer this.

Available Now

Near-Term — Infrastructure Investment

Establish Provenance Ingestion Requirements

Define and enforce provenance attestation requirements for all documents entering knowledge bases and all skills entering tool registries. C2PA is the most widely adopted attestation standard. The policy decision — what provenance is required for ingestion — precedes the technical implementation and is the higher-leverage action.

Policy First

Program — Architecture Commitment

Build the Full Provenance Stack

Trust-weighted retrieval, taint lattice with contradiction detection, and provenance-aware generation complete the RAGShield architecture. The 0% attack success rate against adaptive adversaries requires all five layers. Each layer is independently valuable; the full stack closes the gap the adaptive attacker scenario opens.

Full Coverage

The Identity Gap — Where Series 9 Begins

RAGShield’s finding that adaptive attackers with valid signing keys can still forge attestations identifies the precise limit of provenance without identity. A cryptographic attestation chain is only as strong as the identity verification that anchors it. If an attacker can obtain a valid organizational signing key — through credential theft, insider access, or supply chain compromise of the key management infrastructure — the provenance chain becomes a forgeable artifact. Series 9 — The Identity Gap — picks up exactly here: the agent identity and delegation architecture that makes provenance attestation trustworthy rather than merely present.

Detection tells you what the artifact did. Provenance tells you whether the artifact is what you think it is. An agent operating without provenance verification is trusting a supply chain it cannot inspect.

— Luminity Digital synthesis from arXiv:2604.00387 and arXiv:2603.18034

The Series Conclusion

Three posts, three attack surfaces, one structural conclusion. DDIPE operates through the documentation layer that invocation monitoring cannot see. Stealthy MCP supply chain attacks operate through the behavior gap that content scanning cannot close. Both require a provenance layer beneath them — one that answers not “did this behave badly?” but “is this what we approved?” Hybrid retrieval closes the gradient-guided attack surface today at zero cost. Cryptographic attestation closes the substitution gap. Together they form the supply chain security architecture that detection-based defenses assume is already in place. Series 9 addresses what happens when the identity anchoring that provenance requires is missing — which, at the current state of the MCP ecosystem, is the default condition.

If your enterprise agentic deployment relies on RAG knowledge bases, MCP tool registries, or agent skill ecosystems — and the supply chain security architecture described in this series is not yet in place — the gap between where you are and where the research says you need to be is specific and addressable. The Luminity team is available for a focused conversation.

Book a 30-Minute Conversation

Series 9 — The Identity Gap — examines why 100% of scanned MCP servers lack cryptographic authentication, what verifiable delegation chains require, and how the agent identity architecture that provenance verification depends on gets built. Coming next.

The Supply Chain Beneath the Stack · Three-Part Series

Post 1 · Prior The Skill Is the Attack Surface

Post 2 · Prior The Tool You Trusted Was Never Yours

Post 3 · Now Reading Provenance Is the Architecture

Content scanning and behavioral analysis answer “did this behave badly?” Provenance verification answers “is this the artifact we approved?” — the question neither detection approach can answer. Hybrid retrieval drops gradient-guided RAG poisoning from 38% to 0% at zero cost. C2PA attestation makes silent artifact substitution cryptographically detectable. The taint lattice closes the adaptive attacker case where valid signing keys are forged. Provenance is not a feature to add — it is the structural layer the detection defenses assume is already in place.

Series 1 Where Agentic AI Breaks 5 posts · The failure mode map
Series 2 Building Defensible Agents 3 posts · Deterministic architecture
Series 3 The Invisible Attack 3 posts · Indirect prompt injection
Series 4 Fault Lines 3 posts · Hidden structural risks
Series 5 The Policy Layer 4 posts · Governance architecture
Series 6 The Containment Problem 3 posts · Sandbox and AI control
Series 7 The Memory Problem 3 posts · Memory as attack surface

arXiv:2604.00387 Patil K.S.R. (Apr 2026). RAGShield: Provenance-Verified Defense-in-Depth Against Knowledge Base Poisoning. arXiv:2604.00387
arXiv:2603.18034 Thornton S. (Mar 2026). Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems. arXiv:2603.18034
OWASP 2026 MCP Security Cheat Sheet. cheatsheetseries.owasp.org

Cryptographic Provenance An unbroken, verifiable chain of custody for an artifact from origin to point of use. Makes silent substitution cryptographically detectable — any modification breaks the attestation chain.
C2PA Attestation Coalition for Content Provenance and Authenticity standard for cryptographically signing content with verifiable origin and custody metadata. RAGShield’s document-level attestation layer.
Hybrid Retrieval Combined sparse-dense retrieval (BM25 + vector similarity). Breaks gradient-guided poisoning attacks optimized for pure dense retrieval — 38% ASR against dense drops to ~0% against hybrid. Deployable today via configuration.
Taint Lattice Formal information flow model tracking which documents contributed to generated claims — enables contradiction detection between high- and low-provenance sources.

What RAGShield Demonstrates

RAGShield — Five-Layer Architecture

What Semantic Chameleon Adds: The Retrieval Architecture Defense

Detection, Provenance, and the Gap Between Them

What They Close

What They Close

The Enterprise Implementation Path

Switch to Hybrid Retrieval

Establish Provenance Ingestion Requirements

Build the Full Provenance Stack

The Identity Gap — Where Series 9 Begins

Like this:

Related

Provenance Is the Architecture

What RAGShield Demonstrates

RAGShield — Five-Layer Architecture

What Semantic Chameleon Adds: The Retrieval Architecture Defense

Detection, Provenance, and the Gap Between Them

What They Close

What They Close

The Enterprise Implementation Path

Switch to Hybrid Retrieval

Establish Provenance Ingestion Requirements

Build the Full Provenance Stack

The Identity Gap — Where Series 9 Begins

Share this:

Like this:

Related