February 5, 2026

January 22 – February 5, 2026

0:00/1:34

Your regular briefing on AI security threats, vulnerabilities, and defences from Darkhunt.

TL;DR

MCP is under siege: Three independent reports expose the Model Context Protocol as the #1 protocol-level risk vector for AI agents. Analysis shows 23-41% attack amplification, and the first real-world MCP supply chain attack has been documented in the wild
OpenClaw RCE hits the mainstream: CVE-2026-25253 delivers one-click remote code execution against a 149K-star AI agent platform, earning a dedicated CrowdStrike advisory and enterprise remediation content pack
CrowdStrike names a new attack class: Tool poisoning, tool shadowing, and rug pull attacks formally define how adversaries exploit the reasoning layer of AI agents - not their code
4,500+ AI assistants exposed on the open internet: Unauthenticated admin dashboards, unsanitized shell execution, and API key exfiltration across 49 countries

Top Stories

MCP: The Protocol Everyone Adopted, Nobody Secured

What happened: The Model Context Protocol had a brutal two weeks. First, researchers published the first formal security analysis of MCP, identifying three architectural vulnerabilities-missing capability verification, unauthenticated bidirectional sampling, and unvalidated multi-server trust - that amplify attack success by 23-41% compared to non-MCP tool integrations. Their MCPBench framework tested 847 scenarios across five MCP implementations. The proposed MCPSec defence reduced attack success from 52.8% to 12.4% with 8.3ms overhead per message.

Then Straiker's STAR Labs uncovered SmartLoader, a real-world supply-chain attack targeting MCP registries. The operation cloned a legitimate Oura Ring MCP server, manufactured credibility through five fake GitHub accounts, and distributed the StealC infostealer through MCP Market. The payload targets browser credentials, cryptocurrency wallets, Discord tokens, and SSH keys, using LuaJIT obfuscation with a 443-state virtual machine and persistence mechanisms mimicking Realtek audio drivers.

A third paper proposed SMCP (Secure MCP), a broader security framework adding identity management, mutual authentication, security context propagation, policy enforcement, and audit logging to the protocol.

Why it matters: MCP is rapidly becoming the standard for AI agent tool integration. These three reports converge on the same conclusion: its design choices create a centralised risk vector that amplifies attacks beyond those faced by non-MCP integrations. The SmartLoader attack is especially alarming because it shows that MCP registries lack even basic security controls compared with traditional package managers such as npm and PyPI. No signing, no provenance verification, no automated malware scanning. The barrier to registry poisoning is trivially low.

The vulnerabilities are protocol-level, not implementation bugs. You cannot patch your way out of architectural flaws.

Darkhunt perspective: This is what happens when a protocol is designed for developer convenience and adopted at speed without adversarial analysis. MCP was built to make tool integration easy-and it succeeded. But every design choice that makes integration frictionless also makes attack propagation frictionless. The MCPSec and SMCP proposals are necessary first steps, but the real challenge is retroactive adoption across an ecosystem that has already shipped without these protections. Organisations deploying MCP servers today should treat every server as an untrusted input source, enforce capability validation at the client level, and audit their MCP registries with the same rigour they apply to dependency management.

OpenClaw: When a Viral AI Agent Becomes an Enterprise Threat

What happened: CVE-2026-25253 (CVSS 8.8) revealed that OpenClaw, an open-source AI agent platform with 149K+ GitHub stars, could be compromised through a single malicious link. The vulnerability exploits cross-site WebSocket hijacking: the Control UI trusts the gatewayUrl query parameter without validation, auto-connects on load, and transmits the stored gateway token. An attacker controls where the WebSocket connects, enabling them to steal tokens, disable safety guardrails, bypass container sandboxes, and execute commands directly on the host. The attack succeeds even when the gateway listens only on localhost, since the victim's browser initiates the outbound connection.

CrowdStrike followed with a dedicated security advisory treating OpenClaw as a shadow IT risk requiring enterprise detection and remediation. They released a Search & Removal Content Pack and Falcon AIDR guardrails to enable real-time, prompt validation against OpenClaw instances.

Why it matters: This is the first major CVE in a viral open-source AI agent platform, and it signals a new class of "Agent-Specific Vulnerabilities" where traditional web security assumptions break down. Localhost is not a trust boundary when a browser agent mediates the connection. Container sandboxes do not provide isolation guarantees when the agent has operator-level API access. CrowdStrike is releasing a formal advisory and remediation content pack—the same treatment they give to traditional malware means AI agents have crossed the threshold into enterprise threat territory.

Darkhunt perspective: OpenClaw is the canary. Every AI agent platform with a web UI, API keys, and tool execution capabilities shares the same fundamental attack surface. The specific vulnerability (WebSocket hijacking via URL parameter) is fixable. The systemic issue is not: AI agent platforms conflate trust boundaries that traditional security architectures keep separate. The browser, the agent runtime, the tool execution layer, and the host filesystem are all one hop apart. Security teams need to inventory their AI agent deployments with the same discipline they apply to endpoint management, and build detection for the unique ways these agents can be weaponised.

Attack Vectors and Vulnerabilities

Agentic Tool Chain Attacks: A New Threat Taxonomy

CrowdStrike formally defined three attack types targeting AI agent reasoning:

Tool Poisoning: Malicious instructions embedded in tool descriptions. The add_numbers tool secretly reads ~/.ssh/id_rsa and exfiltrates its contents via metadata parameters. The tool works correctly, and the side channel operates invisibly.
Tool Shadowing: One tool's description affects how the agent uses another tool. The calculate_metrics tool includes instructions to BCC an attacker's email address on all messages sent via the separate send_email tool. Cross-tool influence without cross-tool code modification.
Rugpull Attacks: Post-integration tool updates introduce backdoors. A fetch_data tool integrates cleanly, passes review, and then receives a server-side update adding exfiltration. The change operates outside deployment pipelines.

The common thread: the attack surface is in the reasoning layer where agents interpret natural language and metadata. These are not software vulnerabilities. They are reasoning vulnerabilities.

CacheAttack: Poisoning the Infrastructure Layer

New research reveals that semantic caching systems, used by major cloud providers to reduce LLM inference costs, are vulnerable to key collision attacks. CacheAttack achieves an 86% success rate in hijacking LLM responses and 84.5% degradation in agent tool selection. The core insight: semantic cache keys function as fuzzy hashes, not cryptographic hashes. The similarity thresholds that make caching useful also make collisions exploitable. This attack surface grows as semantic caching becomes standard for cost optimisation.

4,500+ Exposed AI Assistants Across 49 Countries

Straiker documented over 4,500 exposed Clawdbot/Moltbot instances (4,211 on Shodan, 333 on Zoomeye) with unauthenticated admin dashboards, API key exfiltration from .env files, credential theft from messaging platforms (WhatsApp, Slack, Discord, Telegram, Teams, Signal), and unsanitized shell execution via an "exec tool" feature. These are not sophisticated attack chains. These are front doors left wide open.

Defensive Developments

Provable Jailbreak Defence with 94% Utility Retention

A provable defence framework reduces the success rate of gradient-based attacks from 84.2% to 1.2% on Llama-3 while preserving 94.1% benign utility - substantially outperforming character-level alternatives (74.3% utility). The approach uses Certified Semantic Smoothing via Stratified Randomised Ablation, which partitions inputs into fixed structural prompts and variable payloads, and Noise-Augmented Alignment Tuning, which converts the base model into a semantic denoiser. This is the first framework offering provable robustness guarantees that remain practical for production deployment.

MCPSec and SMCP: Hardening the Protocol

Two complementary proposals address MCP security:

MCPSec (arXiv:2601.17549): Backwards-compatible extension adding capability validation and message authentication. Reduces attack success from 52.8% to 12.4% with 8.3ms overhead.
SMCP (arXiv:2602.01129): Broader security architecture adding identity management, mutual authentication, security context propagation, policy enforcement, and audit logging. Targets enterprise governance requirements not included in the base MCP specification.

Together, these represent the emerging security layer for MCP that should have shipped with the protocol.

Few-Shot Examples: A Double-Edged Sword for Defences

Research on few-shot interactions with prompt defences reveals that the same technique produces opposite effects depending on defence paradigm: Role-Oriented Prompts gain up to 4.5% safety improvement from few-shot examples, while Task-Oriented Prompts lose up to 21.2% effectiveness. Anyone deploying prompt-based safety measures in production needs to understand which defence paradigm they are using before adding a few-shot examples.

Research and Papers

Prompt Injection SoK: The Numbers Are Damning

The Systematisation of Knowledge on prompt injection in coding assistants (Maloyan & Namiot) is required reading. Their three-dimensional taxonomy--delivery mechanisms, modalities, propagation patterns--provides the first structured framework for classifying the 42 attack techniques identified across 78 studies. The conclusion is clear: "architectural-level mitigations rather than ad-hoc filtering approaches" are required. The gap between 85% attack success and under 50% defence effectiveness poses a challenge for the next generation of AI security systems.

Anthropic: Misalignment is Messier Than We Thought

Anthropic's "Hot Mess of AI" research challenges the dominant AI safety narrative. Testing across Claude Sonnet 4, o3-mini, o4-mini, and Qwen3, they find that extended reasoning chains amplify incoherence, larger models improve on easy tasks but deteriorate on hard problems, and natural variation dominates over deliberate reasoning changes. The implication for security: AI failures may look more like industrial accidents than coherent pursuit of wrong goals. This means defence strategies designed to contain a rational optimiser may be poorly suited to the actual failure modes. Ensemble methods and variance reduction may matter more than alignment constraints.

Industry Moves

CrowdStrike Stakes Its Claim in AI Agent Security

CrowdStrike published two major pieces during this period — the Agent Toolchain Attack Taxonomy and the OpenClaw security advisory —positioning itself as the enterprise authority on AI agent threats. The OpenClaw advisory includes Falcon detection signatures, AIDR guardrails, and a Search & Removal Content Pack. This is CrowdStrike treating AI agents with the same operational seriousness as traditional endpoint threats.

Alice (ActiveFence) Partners with NVIDIA on AI Safety

Alice (formerly ActiveFence) launched an end-to-end AI safety lifecycle leveraging NVIDIA Garak for automated red teaming, NeMo Curator for dataset filtering, and their WonderCheck platform for continuous adversarial testing across text, image, audio, and video. The Rabbit Hole threat intelligence library covers 110+ languages. The NVIDIA partnership signals growing enterprise demand for GPU-accelerated safety testing infrastructure.

The Agent Identity Governance Gap

BleepingComputer reported on the growing identity blind spot created by AI agents operating outside traditional IAM controls. Organisations are discovering hundreds of untracked agents once they actually look. The proposed five governance pillars--continuous discovery, ownership enforcement, dynamic least privilege, identity-centric traceability, and lifecycle management--represent a reasonable framework, but the core problem remains: identity management architectures built for humans cannot govern entities that spawn, clone, and act autonomously at machine speed.

The Darkhunt Take

This was the two-week period when theory became practice. CrowdStrike defined tool poisoning, tool shadowing, and rugpull attacks as a formal threat taxonomy. Within days, SmartLoader demonstrated exactly that: a supply chain attack exploiting MCP registries to distribute malware through the tool integration layer. The gap between "researchers warn about X" and "attackers do X" has collapsed to near zero.

Three patterns demand attention:

The reasoning layer is the new attack surface. Every major story in this period focuses on how AI agents think, not how their code runs. Tool poisoning manipulates agent reasoning through descriptions. Prompt injection hijacks agent behaviour through natural language. Cache attacks corrupt agent decisions through the infrastructure. These are not vulnerabilities in the traditional sense--they are emergent properties of systems that interpret language as instructions. You cannot firewall reasoning.

Protocol adoption is outrunning protocol security. MCP is the most consequential example. Designed for interoperability, adopted for convenience, and now facing three simultaneous security crises: architectural flaws that amplify attacks by 23-41%, registries with no provenance verification, and no standard mechanism for capability validation or message authentication. MCPSec and SMCP are promising proposals, but the protocol's installed base is growing faster than its security layer. This is the AI equivalent of deploying HTTP everywhere and then trying to retrofit TLS.

Static defenses are structurally inadequate. OpenAI's admission that prompt injection cannot be fully solved, combined with the SoK finding that 85% of adaptive attacks defeat state-of-the-art defenses, should end the debate about whether we can build a filter good enough. The provable defense framework's reduction from 84.2% to 1.2% attack success is remarkable--and it works by fundamentally changing how the model processes inputs, not by adding another detection layer. The defences that work are the ones that change continuously: RL-trained adversarial systems that discover new attacks, architectures that reason about intent rather than pattern-match signatures, and closed-loop systems where offence directly informs defence.

The uncomfortable conclusion: the AI security industry cannot keep pace with deployment velocity through advisories and framework proposals. The speed, scale, and novelty of these attacks require security systems that are themselves agentic-systems that probe, reason, adapt, and harden in real time. Not because automation is convenient, but because the attack surface evolves faster than humans can keep up with.

The organisations that recognise this first will have a meaningful advantage. The rest will spend 2026 reading advisories about attacks that already happened.

Darkhunt AI builds autonomous systems that probe, reason, and harden AI defenses. Learn more

Back to Digest

Know what your AI agent does before someone else does.

Try Darkhunt ->

Start free · Onboarding included

Know what your AI agent does before someone else does.

Try Darkhunt ->

Start free · Onboarding included

Know what your AI agent does before someone else does.

Try Darkhunt ->

Start free · Onboarding included