The Supply Chain Is Already Compromised

Posts 1 and 2 in this series established structural vulnerabilities — a safety alignment gap at the tool-call layer and a monoculture risk at the MCP protocol layer. This post turns from structure to evidence. The four papers most directly bearing on agent skill supply chain security are Formal Analysis and Supply Chain Security for Agentic AI Skills (arXiv:2603.00195), Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks (arXiv:2602.20156), Malicious Agent Skills in the Wild (arXiv:2602.06547), and Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward (arXiv:2602.12430). Together they document not a theoretical threat but an active, ongoing compromise of the infrastructure through which agents acquire capabilities.

In January 2026, security researchers monitoring the OpenClaw agent marketplace detected an anomalous pattern in recently published skill packages. Skills are the modular capability bundles that agents install to extend their functionality — the equivalent of browser extensions or application plugins, but for AI agents. The skills in question appeared legitimate at surface inspection. Their metadata described reasonable, useful capabilities. Their download counts were climbing.

What the researchers found on closer examination was that over 1,200 of these skills contained embedded adversarial instructions — crafted to manipulate agent behavior, exfiltrate data, or redirect agent actions toward attacker-controlled endpoints. The campaign was later named ClawHavoc. It was not an isolated incident, not a proof-of-concept, and not a research simulation. It was a coordinated real-world attack on the infrastructure through which AI agents acquire capabilities, and it had been running undetected.

ClawHavoc is the event that the 2026 research corpus was, in part, written in response to. Understanding what it revealed about agent skill supply chains — and what the research now says about the scale of the underlying problem — is the subject of this post.

26%

Percentage of community-contributed agent skills found to contain vulnerabilities across 14 distinct vulnerability patterns — established by a large-scale survey of skills from public registries (arXiv:2602.12430). More than one in four skills from community sources carries a meaningful security flaw. The registries themselves have become an attack surface.

What a Skill Supply Chain Attack Actually Looks Like

To understand why supply chain attacks on agent skills are so effective, it helps to understand what agent skills are and how agents consume them. A skill is a structured package — typically a combination of natural language instructions, tool definitions, memory configurations, and behavioral guidelines — that an agent loads to acquire a new capability. When an agent installs a skill for, say, calendar management or code review, it is loading a set of instructions that will shape how it reasons, what tools it invokes, and what it does with the outputs.

The key insight is that skills are instructions. They are not sandboxed executables with defined input/output contracts. They are natural language and structured data that flow directly into the agent’s context window and influence its subsequent behavior. A malicious skill does not need to exploit a buffer overflow or inject shellcode. It needs only to contain instructions the agent will follow — and agents, by design, follow instructions.

Two Archetypes of Malicious Skills

Data Thieves embed credential harvesting and exfiltration logic within otherwise functional skill packages. The skill performs its advertised function — calendar management, document summarization, contact lookup — while simultaneously routing sensitive data encountered during task execution to attacker-controlled endpoints. The agent completes the task. The user sees correct output. The exfiltration is invisible.

Agent Hijackers embed behavioral override instructions that subvert the agent’s decision-making for specific trigger conditions. When the agent encounters a defined context — a particular user query, a specific tool result, a recognized entity name — the hijacker instructions activate and redirect agent behavior toward the attacker’s objective. Between triggers, the skill functions normally.

The Empirical Scale: What the Research Found in the Wild

The Malicious Agent Skills in the Wild paper (arXiv:2602.06547) provides the first ground-truth dataset of confirmed malicious skills, behaviorally verifying 98,380 skills sourced from two community registries. The numbers are stark. The researchers identified 157 confirmed malicious skills carrying 632 distinct vulnerabilities. Of those, 93.6% were subsequently removed by registry maintainers after disclosure — confirming that the threats were real, not artifacts of methodology.

The 157 confirmed malicious skills represent the floor, not the ceiling. Behavioral verification at scale is computationally expensive, and the researchers applied it to a sample of the available corpus. The survey paper (arXiv:2602.12430), which took a broader vulnerability-pattern approach across community-contributed skills, found that 26.1% of skills contain vulnerabilities across four categories: prompt injection (the largest share), data exfiltration at 13.3%, privilege escalation at 11.8%, and supply chain risks. At that rate, the 98,380 skills in the verification dataset would contain more than 25,000 vulnerable packages.

The Registry Trust Problem

Agent skill registries currently operate on a model closer to app stores circa 2009 than to mature software package repositories. Submission is open, review is manual and inconsistent, and behavioral verification at scale does not exist as standard practice. The 93.6% removal rate after disclosure is not evidence that the system works — it is evidence that the system depends entirely on external researchers finding and reporting malicious packages before they cause harm.

What Skill-Inject measured

The Skill-Inject paper (arXiv:2602.20156) approaches the problem from a measurement perspective, introducing a benchmark of 202 injection-task pairs spanning a spectrum from obviously malicious to subtle, context-dependent attacks. The benchmark tests whether frontier models can be induced to execute harmful behaviors — including data exfiltration, destructive file operations, and what the researchers describe as ransomware-like behavior — through skill file manipulation alone.

The results establish an uncomfortable upper bound. Across frontier models, attack success rates reach up to 80%. The paper’s most significant finding is not the success rate itself but its implication for defensive strategy: the researchers conclude that the problem will not be solved through model scaling or simple input filtering. Larger, more capable models are not reliably more resistant to skill injection. The vulnerability is structural, not a function of model quality.

The supply chain attack surface is not a future risk to be designed against. It is a present reality that the ecosystem is already navigating without adequate tools to do so reliably.

— Synthesis from Malicious Agent Skills in the Wild (arXiv:2602.06547) and Skill-Inject (arXiv:2602.20156)

ClawHavoc and the Formal Response

The Formal Analysis and Supply Chain Security for Agentic AI Skills paper (arXiv:2603.00195) is the most technically rigorous response to ClawHavoc in the corpus. Its authors introduce the DY-Skill attacker model — an adaptation of the Dolev-Yao formal attacker model from cryptographic protocol analysis — applied to the agent skill supply chain. The DY-Skill model captures adversarial capabilities across three phases: skill publication (inserting malicious packages into registries), skill transit (intercepting and modifying packages in transit), and skill execution (manipulating agent behavior at runtime).

The paper introduces SkillFortify, a detection framework combining capability-based sandboxing, Agent Dependency Graph analysis, and formal verification of skill behavioral properties. On the SkillFortifyBench benchmark, the framework achieves 96.95% F1 detection with 100% precision — meaning it identifies malicious skills with near-perfect accuracy and generates no false positives. The 100% precision figure is particularly significant for production deployment: a detection system that flags legitimate skills as malicious creates friction that organizations quickly learn to ignore.

A Framework for What Good Governance Looks Like

The survey paper (arXiv:2602.12430) synthesizes the research into a four-tier gate-based Skill Trust and Lifecycle Governance Framework. The framework is worth examining not as a regulatory prescription but as a map of what the current ecosystem is missing.

What Registries Do Today

Open Submission, Manual Review

Skills are submitted by any developer, reviewed manually by registry staff on a best-effort basis, and published without systematic behavioral verification. Trust is implicit — a skill in the registry is assumed to be safe unless proven otherwise.

This model produced 157 confirmed malicious skills in a single corpus sample, a 26.1% community vulnerability rate, and the ClawHavoc campaign operating undetected until external researchers intervened.

Trust by Default

What the Research Proposes

Gate-Based Lifecycle Governance

Tier 1: Static analysis of skill metadata and instruction patterns at submission. Tier 2: Behavioral verification in sandboxed execution environments. Tier 3: Capability attestation — cryptographic binding of claimed capabilities to verified behavior. Tier 4: Runtime monitoring with anomaly detection across deployed skill instances.

Each tier is a gate. Skills that pass all four carry a verified trust attestation. Those that fail are quarantined for review. Trust is earned, not assumed.

Verified by Gate

The gap between these two columns is not primarily a technical gap. The static analysis, sandboxed execution, and runtime monitoring capabilities described in the right column exist. SkillFortify demonstrates 96.95% F1 on detection. The gap is one of incentive and infrastructure — registry operators have not yet been compelled to build or mandate verification pipelines, and the ecosystem has not yet established the standards that would make verified trust attestation meaningful across registries.

The Governing Insight

Agent skill supply chain security is not a model problem — it is a governance problem. The tools to detect malicious skills exist and perform well. What does not yet exist is the institutional infrastructure to make their use mandatory, consistent, and interoperable across the registries through which agents acquire capabilities. ClawHavoc was detectable. It was not detected in time because detection was not required.

Where Agentic AI Breaks · Five-Part Series

Post 1 · Published Why Safety Alignment Fails at the Tool-Call Layer Post 2 · Published The MCP Problem: How Standardization Created a Monoculture Attack Surface

Post 3 · Now Reading The Supply Chain Is Already Compromised

Post 4 · Published The Viral Agent Loop: A New Class of Threat Post 5 · Published Why Adding More Agents Makes Data Exposure Worse

The Supply Chain Is Already Compromised

What a Skill Supply Chain Attack Actually Looks Like

Two Archetypes of Malicious Skills

The Empirical Scale: What the Research Found in the Wild

The Registry Trust Problem

What Skill-Inject measured

ClawHavoc and the Formal Response

A Framework for What Good Governance Looks Like

Open Submission, Manual Review

Gate-Based Lifecycle Governance

Up Next: Post 4 — The Viral Agent Loop

Like this:

Related

The Supply Chain Is Already Compromised

What a Skill Supply Chain Attack Actually Looks Like

Two Archetypes of Malicious Skills

The Empirical Scale: What the Research Found in the Wild

The Registry Trust Problem

What Skill-Inject measured

ClawHavoc and the Formal Response

A Framework for What Good Governance Looks Like

Open Submission, Manual Review

Gate-Based Lifecycle Governance

Up Next: Post 4 — The Viral Agent Loop

Share this:

Like this:

Related