The Open Skills Security Problem: What Three Research Papers Reveal About Trusting Community-Built AI Capabilities

The agentic AI era rests on a compelling premise: rather than rebuilding procedural intelligence from scratch for each agent deployment, skills — modular, file-based capability packages — can be shared across agents, teams, and organizations. The research community is now building the infrastructure to make this possible at scale. But three papers published between January and March 2026 reveal a structural security problem at the foundation of this vision that the field has been too slow to address.

In February 2026, researchers at Zhejiang University and collaborators across fourteen institutions published SkillNet, a framework for creating, evaluating, and organizing AI skills at scale. The paper described a repository of over 200,000 skills, a five-dimensional evaluation framework, and empirical results showing a 40% improvement in agent task performance when agents are equipped with curated skill collections. It was, by design, an infrastructure paper — a demonstration of what becomes possible when fragmented agent experience is formalised into reusable, composable knowledge units.

But within the SkillNet paper’s limitations section was a candid acknowledgment that the broader ecosystem had not yet grappled with: if malicious users contribute poisoned or adversarial skills, the current Safety evaluation mechanism can detect some of these cases, but cannot fully mitigate them. That sentence points toward a problem that two companion papers — published in January and February 2026 by the same lead research group — had already begun to quantify. Together, the three publications form a coherent picture of both the promise of open skill ecosystems and the security debt they have already accumulated.

Skills execute with implicit trust and minimal vetting, creating a significant yet undercharacterized attack surface.

— Liu et al., Agent Skills in the Wild, arXiv:2601.10338, January 2026

What Skills Are — and Why the Architecture Creates Risk

To understand the security exposure, it helps to understand what agent skills actually are at a technical level. A skill is a structured folder containing a central SKILL.md file, which defines metadata and step-by-step instructions for an agent. Skills may also include bundled scripts — Python files, shell commands, or other executable programs — along with templates and supporting resources. Agents discover skills by reading compact metadata, activate them when a task matches, and execute the full instruction set including any bundled code.

This architecture is powerful precisely because it is permissive. Skills can instruct agents to read files, call APIs, invoke system commands, access credential stores, and interact with external services. These are the same capabilities that make skills useful for legitimate automation — and the same capabilities that make a malicious or negligently written skill dangerous. As Anthropic’s own documentation explicitly warns, skills can introduce vulnerabilities and enable agents to exfiltrate data and take unintended actions, with the recommendation that users install only from trusted sources and thoroughly review any skill before installation.

The Implicit Trust Problem

Unlike browser extensions, which operate inside a sandboxed environment with explicit permission prompts, agent skills execute with the full privileges of the user running the agent. There is no operating-system-level sandboxing that prevents a skill from accessing SSH keys, reading environment variables, or making outbound network requests. The trust model is entirely implicit — if an agent loads a skill, that skill runs with user-level permissions on the host machine.

Major platforms have all adopted this architecture. As the January 2026 empirical study notes, Claude Code, Codex CLI, and Gemini CLI all support metadata-rich instruction files with bundled scripts and implicit activation. Community registries including skills.rest and skillsmp.com aggregate third-party skills without mandatory security review. The Model Context Protocol extends this same pattern with tools, resources, and prompts as primitives. Early empirical studies of MCP found 7.2% of servers vulnerable and 5.5% exhibiting tool poisoning — suggesting that the risk profile was visible before the skills ecosystem had even fully formed.

The First Large-Scale Measurement: 26.1% of Skills Are Vulnerable

Liu et al.’s January 2026 paper Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale, provides the first systematic measurement of the security landscape across real-world skill marketplaces. The researchers collected 42,447 skills from two major marketplaces — skills.rest and skillsmp.com — and systematically analyzed 31,132 of them using SkillScan, a multi-stage detection framework combining static analysis with LLM-based semantic classification.

26.1%

Of all published skills in the study contain at least one security vulnerability, spanning 14 distinct patterns across four categories — prompt injection, data exfiltration, privilege escalation, and supply chain risks. Data exfiltration (13.3%) and privilege escalation (11.8%) are the most prevalent vulnerability classes. Skills bundling executable scripts are 2.12 times more likely to be vulnerable than instruction-only skills.

The 14-vulnerability taxonomy the paper develops is grounded in observed patterns rather than hypothetical threat models. It spans four primary categories, each with distinct attack surfaces and security implications for organizations deploying agents that consume community-published skills.

Vulnerability Category

Prompt Injection

Malicious instructions embedded in SKILL.md that redirect agent behavior at reasoning time — invisible to users reviewing the skill description but active when the agent reads the full instruction set during execution.

Security Impact

Agent Behavior Manipulation

The agent can be instructed to perform actions outside the user’s intended scope — exfiltrating data from the current session, modifying files, or silently calling external endpoints while appearing to execute a legitimate task.

Vulnerability Category

Data Exfiltration

The most prevalent vulnerability class at 13.3% of skills. Includes code accessing credential stores, SSH keys, authentication tokens, and password managers — and patterns where skills described as performing local processing silently POST data to external servers.

Security Impact

Credential and Session Theft

Credentials, API tokens, and environment variables harvested during skill execution can be exfiltrated through outbound HTTP calls that are indistinguishable from legitimate API activity at the network level without dedicated monitoring.

Vulnerability Category

Privilege Escalation

Skills invoking elevated privileges through sudo or similar mechanisms without documented justification — found in 41 skills in the analysis sample. The second most prevalent major category at 11.8% of vulnerable skills.

Security Impact

Host System Compromise

Privilege escalation within a skill execution context can allow persistent access, modification of system configurations, or installation of additional payloads — converting an agent’s local execution environment into an attacker-controlled foothold.

Vulnerability Category

Supply Chain Risks

156 skills in the sample depend on packages without version pinning, leaving them vulnerable to dependency confusion and malicious updates. Supply chain compromise of a previously-benign skill can convert a trusted, widely-installed skill into a delivery mechanism for malicious payloads.

Security Impact

Deferred and Scaled Compromise

Supply chain attacks do not require malicious authorship at the point of publication. A skill trusted by thousands of agents can be compromised through a dependency update — allowing an attacker to reach all consumers simultaneously without modifying the skill itself.

The paper identifies three adversary archetypes driving these vulnerabilities. Malicious Authors intentionally design skills to exfiltrate data, establish persistence, or manipulate agent behavior. Supply Chain Attackers compromise previously benign skills through account takeover or dependency confusion. Negligent Developers introduce unintentional vulnerabilities through insecure coding, excessive permissions, or unsafe instruction patterns — a category the paper notes is analogous to the 5.6% of VS Code extensions found to exhibit suspicious behaviors due to negligence rather than malice.

The Follow-Up Study: Confirmed Malice and Two Attacker Archetypes

The January paper established prevalence rates through static and semantic analysis. The February 2026 follow-up — Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study — addressed the question the first paper could not: of the skills flagged as potentially risky, how many are confirmed malicious through actual behavioral execution? The researchers expanded the dataset to 98,380 skills and added dynamic verification — executing candidates in sandboxed Docker containers with network monitoring, honeypot credentials, and outbound traffic capture.

157

Confirmed malicious skills identified from 98,380 analyzed, carrying 632 labelled vulnerabilities. Malicious skills average 4.03 vulnerabilities each, spanning a median of three kill chain phases. A single threat actor accounts for 54.1% of confirmed cases through templated brand impersonation — a pattern directly analogous to typosquatting attacks in software package registries.

The behavioral verification surface two distinct attacker profiles that the research team characterizes as ecosystem archetypes. Understanding these archetypes is important for security practitioners because they imply different detection strategies, different mitigation priorities, and different levels of threat actor sophistication.

Attacker Archetype 1

Data Thieves

Skills designed to harvest credentials, API tokens, and environment variables during execution — typically using supply chain techniques to ensure the exfiltration code reaches execution. Characterized by outbound network calls disguised as legitimate API activity, credential store access, and patterns where skill descriptions claim local processing while code POSTs to attacker infrastructure.

Advanced Data Thief skills exploit the AI platform’s own hook system and permission flags to trigger collection at the moment the agent loads credentials into context — making the theft invisible to post-execution log review.

Credential Harvest

Attacker Archetype 2

Agent Hijackers

Skills that subvert agent decision-making through instruction manipulation embedded in SKILL.md. Rather than targeting credentials directly, Agent Hijackers redirect what the agent does — instructing it to perform actions outside the user’s intent, suppress outputs, communicate with external systems, or persist modified instruction sets for future sessions.

The research found that shadow features — capabilities absent from public documentation — appear in 0% of basic-level attacks but in 100% of advanced ones, indicating deliberate obfuscation as a marker of sophisticated adversary capability.

Behavior Subversion

The kill chain analysis is particularly instructive. Malicious skills at the intermediate level — which account for 77.7% of confirmed cases — span a median of three kill chain phases. This is not opportunistic or accidental. It reflects deliberate construction: reconnaissance of what the agent has access to, collection of target materials, and exfiltration or manipulation as an objective. The ecosystem’s current state is compared by the authors to the browser extension ecosystem circa 2012 — approximately two years before mass exploitation matured into a recognised threat category.

On Responsible Disclosure

The February 2026 study’s responsible disclosure process resulted in 93.6% of confirmed malicious skills being removed within 30 days — a finding that reflects registry responsiveness but also confirms that the detection and removal burden currently falls almost entirely on researchers rather than on platform-level automated vetting. The skills were present and available for installation throughout the period between publication and removal.

What SkillNet Gets Right — and What It Cannot Yet Solve

SkillNet’s five-dimensional evaluation framework — assessing Safety, Completeness, Executability, Maintainability, and Cost-awareness — represents the most systematic attempt to date to build quality assurance into a large-scale skills infrastructure. The Safety dimension explicitly assesses potential risks including hazardous system operations and robustness against prompt injection or adversarial manipulation. For Executability, SkillNet complements LLM-based scoring with empirical validation: skills containing code are executed in controlled sandbox environments to verify runtime correctness.

The reliability validation results are striking. Across a sample of 200 skills reviewed by three PhD-level computer science annotators, the Mean Absolute Error between human judgements and automated LLM-based scores remains below 0.03 across all five dimensions, with Quadratic Weighted Kappa reaching near-perfect levels of 1.000. The automated evaluator achieves human-aligned scoring at scale.

SkillNet’s Five Evaluation Dimensions

Safety — Assesses potential risks including hazardous system operations such as unauthorised file deletion, and robustness against prompt injection or adversarial manipulation of the skill’s instruction set.

Completeness — Evaluates whether the skill encapsulates all critical procedural steps and explicitly defines necessary prerequisites, dependencies, and execution constraints required for reliable agent execution.

Executability — Verifies whether the skill can be successfully implemented by agents in sandboxed environments, identifying hallucinated tool calls, broken dependencies, or ambiguous instruction patterns.

Maintainability — Measures the modularity and composability of skills, ensuring they can be locally updated without disrupting global dependencies or breaking backward compatibility across the skill graph.

Cost-awareness — Quantifies execution overhead including time latency, computational resource consumption, and API usage costs to support efficiency optimisation and prevent runaway resource consumption.

But SkillNet’s authors are candid about the limits of this framework. The Safety dimension can detect some cases of poisoned or adversarial skills — but cannot fully mitigate them. This is not a flaw in the evaluation design; it reflects a fundamental asymmetry. Detecting a malicious skill requires understanding adversarial intent, which is a harder problem than measuring technical quality. A skill that scores well on Executability, Completeness, and Maintainability — and passes Safety screening — can still carry obfuscated exfiltration logic that surfaces only during behavioral execution in a live environment.

This is precisely the gap that the February 2026 behavioral verification study was designed to fill: static analysis and LLM-based semantic classification can flag 26.1% of skills as potentially vulnerable, but confirming malicious intent requires execution in a controlled environment with active network monitoring and honeypot credentials. That capability exists in research settings. It does not yet exist as standard infrastructure for the community registries where most skills are published and consumed.

The Consent Gap and the Structural Trust Problem

Across all three papers, a structural pattern emerges that the January 2026 empirical study describes as the Consent Gap. All three adversary types — malicious authors, supply chain attackers, and negligent developers — exploit the same underlying condition: users and agents do not understand what a skill will do before they load it. The gap between what a skill’s description promises and what its code and instructions actually execute is invisible at the point of installation.

This is not a solvable problem through better documentation or community ratings. It is an architectural problem. Skills execute with the full trust of the agent runtime and the full privileges of the user. There is no permission prompt, no capability declaration enforced at execution time, and no runtime confinement that limits what a loaded skill can do. The trust is granted implicitly by the act of installation — which means the entire security burden falls on the review process that happens before installation, a process that most users do not perform and most platforms do not require.

Comparison to Prior Extensibility Ecosystems

The agent skills ecosystem shares risk profiles with browser extensions, IDE plugins, and software package managers — all of which went through a period of rapid growth in which security considerations were secondary to functionality adoption. The browser extension ecosystem circa 2012 is the closest historical analogue: broad distribution, implicit trust, minimal vetting, and a threat actor population that had not yet fully discovered the attack surface. The research literature now suggests the agent skills ecosystem is approximately at that stage. Browser extension marketplaces took several more years to develop meaningful automated security review. The question for the AI community is whether the skills ecosystem will follow the same trajectory — or whether the lessons of prior extensibility ecosystems can be applied more quickly.

Implications for Organizations Deploying AI Agents

For security practitioners and AI governance leads, the three papers together carry a set of concrete implications. The 26.1% vulnerability rate in community-published skills is not a theoretical risk — it is an empirical measurement taken from the same registries from which organizations are currently downloading skills to deploy in agent pipelines. The 5.2% of skills exhibiting high-severity patterns strongly suggesting malicious intent represents a non-trivial population in any skill collection of meaningful size.

Skills bundling executable scripts — the category that carries a 2.12x elevation in vulnerability probability — are also the highest-value skills for agent automation purposes. The most powerful skills, the ones that automate complex multi-step workflows involving file system access, API calls, and process execution, are precisely the ones that carry the greatest security exposure. Organizations cannot simply avoid executable-script skills and still realize the automation benefits that skills infrastructure promises.

2.12×

Skills bundling executable scripts are 2.12 times more likely to carry security vulnerabilities than instruction-only skills (Liu et al., arXiv:2601.10338). The most powerful skills for agent automation — those that include Python scripts, shell commands, and tool invocations — are the same category that carries the highest security risk, making risk avoidance through capability restriction a poor substitution for structural security controls.

The responsible disclosure finding — 93.6% removal within 30 days — offers a qualified reason for optimism about registry responsiveness. But it also surfaces a timing problem: skills were present and available for installation from the point of publication until researchers identified and reported them. In a fast-moving ecosystem where agent pipelines are being assembled continuously, the window between a malicious skill’s publication and its removal represents a real exposure period for organizations consuming skills without independent review.

Research Foundations

The three publications examined in this post sit within a wider body of emerging research on agent skill security, skill infrastructure, and agentic system design. The following papers provide the primary research grounding for the analysis above.

SkillNet: Create, Evaluate, and Connect AI Skills

Liang, Zhong, Xu, Jiang et al. (2026). The primary infrastructure paper introducing SkillNet, a repository of over 200,000 curated agent skills with a five-dimensional evaluation framework covering Safety, Completeness, Executability, Maintainability, and Cost-awareness. Demonstrates 40% average reward improvement and 30% step reduction across ALFWorld, WebShop, and ScienceWorld benchmarks. The paper’s Limitations section explicitly acknowledges that poisoned or adversarial skills cannot be fully mitigated by the current Safety evaluation mechanism.

arxiv.org/abs/2603.04448

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Liu, Wang, Feng, Zhang, Xu, Deng, Li, Zhang (2026). The first large-scale empirical security analysis of the agent skills ecosystem. Analyses 31,132 skills from two major marketplaces using SkillScan, a multi-stage detection framework integrating static analysis with LLM-based semantic classification. Finds 26.1% of skills contain at least one vulnerability across 14 patterns in four categories. Develops a three-adversary threat model covering malicious authors, supply chain attackers, and negligent developers.

arxiv.org/abs/2601.10338

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Liu, Chen, Zhang, Deng, Li, Ning, Zhang (2026). The follow-up behavioral verification study, extending the dataset to 98,380 skills and adding dynamic sandboxed execution to confirm malicious intent. Identifies 157 confirmed malicious skills with 632 vulnerabilities, documenting two attacker archetypes — Data Thieves and Agent Hijackers — and finding that a single threat actor accounts for 54.1% of confirmed cases. Responsible disclosure led to 93.6% removal within 30 days.

arxiv.org/abs/2602.06547

Anthropic — Agent Skills Documentation and Security Guidance

Anthropic’s official documentation for agent skills, covering the skills architecture adopted in Claude Code, the implicit trust model under which skills execute, and explicit security warnings that skills can introduce vulnerabilities and enable agents to exfiltrate data and take unintended actions. Recommends that users install only from trusted sources and thoroughly review skills before installation.

agentskills.io

The Open Skills Security Problem: What Three Research Papers Reveal About Trusting Community-Built AI Capabilities

What Skills Are — and Why the Architecture Creates Risk

The Implicit Trust Problem

The First Large-Scale Measurement: 26.1% of Skills Are Vulnerable

Prompt Injection

Agent Behavior Manipulation

Data Exfiltration

Credential and Session Theft

Privilege Escalation

Host System Compromise

Supply Chain Risks

Deferred and Scaled Compromise

The Follow-Up Study: Confirmed Malice and Two Attacker Archetypes

Data Thieves

Agent Hijackers

What SkillNet Gets Right — and What It Cannot Yet Solve

SkillNet’s Five Evaluation Dimensions

The Consent Gap and the Structural Trust Problem

Comparison to Prior Extensibility Ecosystems

Implications for Organizations Deploying AI Agents

Research Foundations

SkillNet: Create, Evaluate, and Connect AI Skills

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study

Anthropic — Agent Skills Documentation and Security Guidance

AI Security & Governance Research Series — March 2026

Share this: