February 6 – February 19, 2026
Your regular briefing on AI security threats, vulnerabilities, and defences from Darkhunt AI
TL;DR
ClawHavoc is the first large-scale supply chain attack on an AI agent marketplace: 1,184 malicious skills planted on ClawHub, deploying credential stealers, reverse shells, and macOS malware through professional-looking AI agent plugins
NIST launches a dedicated AI Agent Standards Initiative, formally recognising autonomous AI agents as a distinct security category requiring new standards for identity, authorisation, and interoperability - RFI responses due March 9
OMNI-LEAK and AgentLeak prove that multi-agent orchestrator systems leak data through internal channels at 68.9%, and output-only monitoring misses 41.7% of violations - perimeter-style defences are structurally blind to the actual attack surface
OWASP publishes the first authoritative security guide for MCP server development, acknowledging the tool-agent interface as a primary attack vector
OpenClaw bolts on VirusTotal scanning in response to ClawHavoc - and immediately admits prompt injection payloads will likely bypass it
Top Stories
ClawHavoc: The First Mass Supply Chain Attack on an AI Agent Marketplace
What happened: Koi Security researcher Oren Yomtov audited all 2,857 skills on ClawHub - the primary marketplace for OpenClaw agent plugins - and found 341 malicious entries. Subsequent analysis by Antiy CERT expanded the total to 1,184 malicious skills that had been historically published on the platform, of which 335 belonged to a coordinated campaign dubbed "ClawHavoc." The skills disguised themselves as crypto wallets, YouTube utilities, and Google Workspace integrations. Payloads included the Atomic macOS Stealer (AMOS), reverse shells, and credential exfiltration routines. The operation was professional: documentation was polished, descriptions were convincing, and the malware targeted both macOS and Windows. The first malicious skill appeared on January 27, with a major surge on January 31.
OpenClaw responded by integrating Google-owned VirusTotal for skill scanning. SHA-256 hashing combined with VirusTotal Code Insight now automatically flags suspicious uploads. Dvuln founder Jamieson O'Reilly, who previously demonstrated the platform's weakness by uploading a malicious top-ranked skill himself, was brought on as lead security advisor. But the maintainers acknowledged a critical limitation: "cleverly concealed prompt injection payloads may bypass scanning." Binary malware scanning does not catch attacks that operate in natural language.
Why it matters: ClawHavoc is to AI agent marketplaces what the event-stream incident was to npm - except the attack surface is worse. Traditional package managers distribute code. AI agent marketplaces distribute code and natural-language instructions that modify agent behaviour. A malicious npm package needs a code vulnerability to exploit. A malicious AI skill can manipulate the agent's reasoning directly through its description, without any code vulnerability.
The VirusTotal response is revealing. It addresses the malware problem (binary payloads) but not the agent problem (prompt injection through skill descriptions and tool metadata). This is the same pattern we saw with MCP registries last period: applying traditional security controls to a fundamentally different threat model.
Darkhunt perspective: The 12% malicious rate on ClawHub (341 out of 2,857) is striking, but the number that matters is the time-to-detection. These skills were live for weeks before anyone looked. AI agent marketplaces require the same continuous security scrutiny as package registries, but with an additional layer that understands how natural-language instructions interact with agent reasoning. Static scanning catches the AMOS dropper. It does not catch the skill description that instructs the agent to read ~/.ssh/id_rsa and include it in its next API call. Defending this surface requires systems that reason about what a skill will cause an agent to do, not just about its binary representation.
NIST Declares AI Agents a Distinct Security Category
What happened: NIST's CAISI (Centre for AI Safety and Innovation) launched the AI Agent Standards Initiative, a formal programme to develop standards for the secure development, deployment, and interoperability of autonomous AI agents. The initiative includes an RFI on AI Agent Security with responses due March 9, 2026, and a separate AI Agent Identity and Authorisation Concept Paper open for comment until April 2. The programme will use convenings, listening sessions, and stakeholder input to shape compliance requirements.
Why it matters: This is the US government officially acknowledging that AI agents are not just "software with LLMs" - they are a distinct category requiring purpose-built security standards. The identity and authorisation focus is particularly significant. Traditional IAM was built for humans and static service accounts. AI agents spawn, delegate, assume roles, and make autonomous decisions at machine speed. The identity problem alone - how do you authenticate an entity that reasons, adapts, and may create sub-agents - has no solved precedent.
The March 9 deadline also matters for timing. Any organisation building or deploying AI agents should be tracking this initiative. The standards that emerge will define compliance requirements for the next several years.
Darkhunt perspective: The Zenity analysis published this period found that the three most cited AI governance frameworks - NIST AI RMF, the EU AI Act, and ISO 42001 - contain zero mentions of agentic AI. NIST's launch of a dedicated initiative is an acknowledgement that existing frameworks are insufficient. This creates a window: the standards are being written now, and the organisations contributing to them will shape the regulatory definition of "secure AI agent deployment". The identity and authorisation paper is especially worth watching. Agent identity is not a solved problem, and NIST's approach will have downstream effects on how every agentic system handles authentication, delegation, and accountability.
Multi-Agent Systems Are Leaking Data Through the Walls, Not the Windows
What happened: Two papers published within a day of each other expose the same fundamental flaw in multi-agent system security. OMNI-LEAK (arXiv:2602.13477) demonstrates that a single indirect prompt injection can cause data leakage across agents in orchestrator-pattern systems, even when data access controls are in place. All frontier models tested, except Claude Sonnet 4, were vulnerable to at least one attack variant. AgentLeak (arXiv:2602.11510) provides a 1,000-scenario benchmark across healthcare, finance, legal, and corporate domains, showing that while multi-agent configurations reduce per-channel output leakage (27.2% vs 43.2% for single agents), internal inter-agent communication still leaks at 68.8%. Total system exposure rises to 68.9%. Output-only audits miss 41.7% of privacy violations.
Why it matters: Most enterprise multi-agent deployments use the orchestrator pattern - a central agent coordinating specialised sub-agents. Both papers show that this architecture has a data leakage problem that access controls alone cannot solve. The OMNI-LEAK finding is especially concerning: even a low success rate (1 in 500) becomes dangerous at enterprise scale when the system processes thousands of requests per day. And AgentLeak's 41.7% blind spot in output-only monitoring means that organisations relying on perimeter-style defences - watching what comes out of the system - are missing nearly half the problem.
Darkhunt perspective: These papers validate a core thesis: you cannot secure multi-agent systems by monitoring their outputs. The data leaks through internal channels - agent-to-agent messages, shared memory, orchestrator routing decisions - that existing security tools do not inspect. Building effective defence requires visibility into the reasoning and communication happening between agents, not just at the boundary. The finding that Claude, in Sonnet 4, resists these attacks while other frontier models do not also suggests that model-level alignment and instruction-following quality have direct security implications. Defence cannot be purely architectural; it must account for the varying susceptibility of the models running within these systems.
Attack Vectors and Vulnerabilities
OpenClaw Gateway Scanning in the Wild
Pillar Security reported active scanning of exposed OpenClaw gateways with prompt injection attempts. Censys found 21,639 exposed instances. Separately, Veracode identified more than 1,000 "claw" packages on npm and PyPI, which enable typosquatting attacks against OpenClaw users. The convergence of active gateway scanning, supply chain poisoning (ClawHavoc), and dependency typosquatting represents a full-spectrum attack against a single AI agent platform.
Architectural Insecurity by Design
Multiple analyses this period converged on the same conclusion about agentic AI platforms. Trend Micro's formal assessment using their TrendAI Digital Assistant Framework concluded that the risks of platforms like OpenClaw - persistent memory, broad permissions, unrestricted configurability - are inherent to the agentic AI paradigm, not bugs in specific implementations. Aikido Security published a separate analysis arguing that OpenClaw's security problems are architectural and cannot be fixed with patches. When two independent security firms reach the same conclusion using different analytical frameworks, it is worth paying attention: the problem lies in the design pattern, not the implementation.
Defensive Developments
OWASP MCP Server Security Guide
OWASP's GenAI Security Project published the first authoritative guide for secure MCP server development, released alongside the OWASP Top 10 for Agentic Applications. The guide addresses the tool-agent interface as a primary attack surface and provides actionable development guidance. Given the protocol-level vulnerabilities exposed last period (23-41% attack amplification, no capability verification, unauthenticated sampling), this is overdue. The guide is a necessary reference for any team building or operating MCP servers, though it represents a starting point rather than a complete solution.
OpenClaw v2026.2.17 Security Fixes
OpenClaw released v2026.2.17, which includes security fixes addressing vulnerabilities identified in earlier audits, as well as support for the Anthropic model. The continued pace of security patching reflects the reactive posture of the most widely deployed open-source AI agent platform. Each fix addresses a specific finding; the architectural concerns raised by Trend Micro and Aikido remain.
Research and Papers
Sparse Auto encoders as Jailbreak Defences
Researchers demonstrated that sparse autoencoders (SAEs) - originally developed for mechanistic interpretability - can be repurposed as effective jailbreak mitigators. Their Context-Conditioned Delta Steering (CC-Delta) method identifies jailbreak-relevant sparse features by comparing token-level representations, then steers model behaviour at inference time. Tested across four models and twelve jailbreak attacks, it achieves comparable safety-utility tradeoffs to dense-space defences with no task-specific training required. This is a meaningful development: it means the growing investment in interpretability research has direct, practical security applications. Tools built to understand how models think can also be used to prevent them from thinking dangerously. Paper: arXiv:2602.12418
AgentDyn: Dynamic Prompt Injection Benchmarking
AgentDyn introduces an open-ended benchmark for evaluating prompt injection attacks against agents, moving beyond static test suites. The paper reviews both prompting-based defences (repeating user prompts after tool invocation) and alignment-based approaches, providing a framework for comparing defensive strategies under realistic conditions. Useful for any team evaluating or building prompt injection defences. Paper: arXiv:2602.03117
Industry Moves
NIST Sets the Regulatory Clock
The AI Agent Standards Initiative establishes a concrete timeline: RFI responses are due March 9, and Identity and Authorisation concept paper comments are due April 2. Any company in the AI agent space - builders, deployers, or security vendors - should be preparing responses. The standards that emerge from this process will shape the compliance landscape for years to come.
Governance Frameworks Already Outdated
Zenity's analysis found that the three most-cited AI governance frameworks (NIST AI RMF, EU AI Act, ISO 42001) contain no mentions of agentic AI. This gap between governance and deployment reality creates both risk and opportunity. Organisations cannot comply with standards that do not yet exist, but they can get ahead by adopting security practices now that will likely become requirements.
OpenClaw's Reactive Security Maturation
Within a two-week period, OpenClaw integrated VirusTotal scanning, engaged a security advisor who had previously exploited its platform, and released a security patch. The trajectory from "no security controls" to "reactive security posture" is progress, but the admission that prompt-injection payloads will bypass binary scanning underscores how far the platform remains from addressing its fundamental threat model.
The Darkhunt Take
Two weeks ago, we wrote that the gap between "researchers warn about X" and "attackers do X" had collapsed to near zero. ClawHavoc proves the point. While researchers were publishing taxonomies of tool poisoning and supply chain attacks, 1,184 malicious skills were already live on ClawHub, stealing credentials and deploying malware at scale. The attackers did not wait for the papers.
Three developments in this period deserve particular attention for where AI security is heading.
First, the supply chain attack surface for AI agents is structurally larger than that of traditional software. ClawHavoc exploited the same vectors as any npm or PyPI attack - typosquatting, fake documentation, trust in the marketplace - but with an additional dimension. AI agent skills carry natural language descriptions that directly influence agent reasoning. A malicious skill does not need a code vulnerability to exploit; it can manipulate the agent through its own description. OpenClaw's VirusTotal integration addresses the issue of binary malware. It does not address the prompt injection problem. Until AI agent marketplaces develop security controls that understand how language-based instructions interact with agent reasoning, they will remain fundamentally more vulnerable than traditional package registries.
Second, research on data leakage confirms that perimeter-style monitoring is insufficient for multi-agent systems. OMNI-LEAK and AgentLeak show that the real exposure occurs in inter-agent communication channels that are not visible to output-only monitoring. The 41.7% blind spot is not a minor gap - it means nearly half of all privacy violations go undetected by current approaches. Effective defence requires inspecting the reasoning layer: how agents communicate, what they share internally, and how orchestrator routing decisions can be manipulated. This is not a filter problem. It is a visibility problem.
Third, NIST's decision to create a dedicated AI Agent Standards Initiative marks the moment when agentic AI security transitions from a niche concern to a regulatory category. The simultaneous Zenity finding that existing frameworks contain zero mentions of agentic AI underscores why this matters: the governance gap is total. The standards being written over the next few months will define compliance requirements for years. Organisations that engage now - particularly on the identity and authorisation questions, which remain genuinely unsolved - will shape the rules. Those who wait will inherit them.
The throughline across all of this is a single uncomfortable reality: AI agent security cannot be solved by applying traditional security controls to non-traditional systems. Binary scanning does not catch language-based attacks. Output monitoring does not catch internal channel leakage. Existing governance frameworks do not address autonomous agents. The systems defending AI agents need to operate at the same level of sophistication as the agents themselves - reasoning about intent, adapting to novel attack patterns, and maintaining visibility into the layers where the actual risk lives.
--
Darkhunt AI builds autonomous systems that probe, reason, and harden AI defences. Learn more
