April 16, 2026

April 2 -April 16, 2026

0:00/1:34

AI Security Digest: April 2-16, 2026

TL;DR

Anthropic withheld Claude Mythos from public release after it autonomously found vulnerabilities in every major OS and browser - 181 Firefox exploits where the previous best model produced two. Exploit timelines have compressed from years to hours.
9 of 428 LLM API routers are actively malicious: injecting code, stealing credentials, and draining crypto wallets. Zero cryptographic integrity exists between client and model.
OWASP cataloged 8 major AI security incidents in Q1 2026. The pattern: attacks are shifting from model outputs to agent identities, orchestration layers, and supply chains.
NomShub turns Cursor into a persistent backdoor via prompt injection, sandbox escape, and remote tunnel exploitation. The trigger: opening a malicious repo.
Microsoft open-sourced a seven-package Agent Governance Toolkit covering all 10 OWASP Agentic AI Top 10 risks -- the first serious baseline for agent runtime security.

Top Stories

Claude Mythos: The Vulnerability Machine Too Dangerous to Ship

Anthropic announced Claude Mythos Preview and then did something unprecedented: withheld it from public release. The reason is quantitative. Mythos autonomously found exploitable vulnerabilities in every major OS and browser, including a 17-year-old FreeBSD RCE (CVE-2026-4747). Where the previous best model generated two Firefox exploits, Mythos generated 181 -- and demonstrated the ability to chain multiple vulnerabilities into single exploit paths without human guidance.

The industry response was immediate. SANS, CSA, and the OWASP GenAI Security Project produced an emergency strategy briefing within days, assembled by 60+ contributors and reviewed by 250+ CISOs. The briefing maps a 13-item risk register to four frameworks (OWASP LLM Top 10, OWASP Agentic Top 10, MITRE ATLAS, NIST CSF 2.0) and highlights a stark metric: mean time from disclosure to exploitation has fallen to less than one day in 2026, down from 2.3 years in 2019.

Anthropic's controlled-access program, Project Glasswing, routes Mythos capabilities to vetted defenders first. But what one lab built, others will replicate.

Why it matters: This is not incremental improvement -- it is a categorical shift. When an AI finds and weaponizes bugs faster than organizations can patch them, the disclosure-patch-deploy cycle breaks. Security programs designed around vulnerability management timelines measured in weeks are now operating in a world measured in hours.

Darkhunt perspective: Mythos validates the thesis we have been building around: the future of offensive security is autonomous, and defense must be equally autonomous. Human-speed patch cycles cannot keep pace with machine-speed vulnerability discovery. What matters now is whether defenders have systems that detect exploitation attempts before patches exist -- and adapt in real time as the threat surface shifts. (Source | SANS/CSA/OWASP Briefing)

LLM API Routers: The Man-in-the-Middle Nobody Checked

Researchers conducted the first systematic study of malicious LLM API routers -- the proxy services between AI applications and model providers. Of 428 routers tested, 9 were actively injecting malicious code into responses, 2 employed adaptive evasion, 17 accessed planted AWS canary credentials, and 1 was draining cryptocurrency wallets. No provider enforces cryptographic integrity between client and model. Every JSON payload -- credentials, system prompts, user data -- passes through in plaintext.

This is a supply chain attack surface hiding in plain sight. Organizations routing agent traffic through third-party proxies for cost optimization are giving those proxies full visibility into and control over every agent interaction. Remember: even read-only access to sensitive data is a data leak risk -- and these routers have read-write access to everything.

Why it matters: Unlike traditional API gateways where payloads are often encrypted end-to-end, LLM routers must read and modify payloads to function. There is no equivalent of TLS termination that preserves confidentiality from the intermediary. Until cryptographic signing for LLM responses becomes standard, every router is a trust decision.

Darkhunt perspective: This exposes infrastructure risk that most agent security assessments miss entirely. Teams audit their prompts, their tool permissions, their model choices -- but not the routing layer that carries all of it. Probing the integrity of the full agent communication path, including intermediaries, is exactly the kind of supply chain testing that autonomous security systems should perform continuously. (Paper)

NomShub: Open a Repo, Lose Your Machine

Straiker disclosed NomShub, a critical vulnerability chain in the Cursor AI code editor that achieves persistent remote shell access through three chained exploits: indirect prompt injection via malicious repository content, sandbox escape using shell builtins that bypass Cursor's restrictions, and exploitation of Cursor's own signed tunnel binary to establish a persistent reverse shell -- a Living-Off-The-Land attack using the IDE's legitimate infrastructure.

The trigger: opening a repository. No other interaction required.

Why it matters: NomShub demonstrates the full agentic attack chain -- from untrusted input through reasoning manipulation to system compromise. Each individual protection was present but insufficient. The vulnerability is in the composition, not the components. This is not a chatbot problem. Cursor is an agent with tool access, file system permissions, and shell execution -- exactly the class of system where security risk concentrates.

Darkhunt perspective: Expect more of this pattern: multi-layer chains where each stage exploits a different trust boundary. AI coding assistants are high-value targets because they operate with developer-level system access. Testing these tools requires chain-of-exploit thinking -- probing not just individual defenses but the seams between them. (Source)

Attack Vectors & Vulnerabilities

Flowise CVSS 10.0: MCP Integration as the Attack Vector

CVE-2025-59528 -- a perfect-score RCE in Flowise's CustomMCP node -- allows arbitrary code execution through unvalidated JavaScript in MCP server configuration parsing. Over 12,000 instances remain exposed. This is Flowise's third actively exploited vulnerability. The pattern is clear: MCP integration, the feature that makes agent builders powerful, is also what makes them dangerous. Six months between disclosure and widespread exploitation underscores how slowly the AI tooling ecosystem patches. (Source)

Memory Poisoning Without Touching the Prompt

eTAMP demonstrates that agent memory can be poisoned without direct injection. A single contaminated environmental observation -- something the agent sees while browsing, not something fed into its prompt -- creates persistent, cross-session compromise. Susceptibility increases 8x under stress conditions, and advanced models are equally vulnerable. The poisoned memory activates whenever retrieved, persisting across websites and sessions. (Paper)

Salami Slicing: Death by a Thousand Safe Prompts

A new attack class chains individually harmless inputs that each pass alignment checks but cumulatively trigger high-risk behavior -- over 90% success on GPT-4o and Gemini. Per-turn safety checks are structurally blind to this pattern. Proposed defenses reduce success by 44-65%, but gaps remain. For agents with multi-turn context, this threat model is especially relevant: guardrails that evaluate each turn in isolation are not a security boundary. (Paper)

Chat Template Fuzzing: Below the Layer You Are Monitoring

TEMPLATEFUZZ targets chat templates -- the formatting layer between raw text and model input -- achieving 98.2% attack success on open-source LLMs and 90% on commercial models with only 1.1% accuracy loss. This attack surface sits below the prompt layer where most defenses operate, making it both high-impact and largely invisible to current monitoring. (Paper)

Reasoning Hijacking: The Attack Your Input Filter Will Not See

JailAgent manipulates agent reasoning trajectories and memory retrieval without modifying user prompts. By extracting trigger patterns and hijacking internal reasoning chains, it achieves jailbreaking across models and scenarios. The attack targets internal cognition rather than the prompt surface -- invisible to any defense that only monitors inputs. (Paper)

Defensive Developments

Microsoft Ships the Agent Security Baseline: Seven Packages, Open Source

Microsoft released an open-source toolkit addressing all 10 OWASP Agentic AI Top 10 risks. The seven packages: Agent OS (policy engine, sub-0.1ms interception), Agent Mesh (cryptographic identity via decentralized identifiers), Agent Runtime (execution rings for privilege separation), Agent SRE (reliability), Agent Compliance (audit and governance), Agent Marketplace (plugin signing), and Agent Lightning (performance). Compatible with 20+ agent frameworks. Microsoft plans to move the project to a foundation -- a signal that agent runtime security is becoming an ecosystem-level standard, not a vendor differentiator. (Source)

OpenAI Agents SDK: Credentials Out, Sandbox In

OpenAI updated its Agents SDK with a control harness separating the control plane from the compute layer. Credentials are isolated from model-generated code execution environments, structurally preventing lateral movement from injected commands. This harness/sandbox pattern -- where the agent never touches secrets directly -- is becoming the standard enterprise agent architecture. (Source)

MCP Defense-Placement Taxonomy

A systematic analysis of MCP security across six architectural layers reveals that existing defenses cluster around the tool layer while leaving host orchestration, transport, and supply-chain layers exposed. The taxonomy maps precisely where current protection stops and where attackers are moving. For anyone building or securing MCP-connected agents, this is the gap analysis to read. (Paper)

Beyond Static Permissions: Agents Are 15x Overprovisioned

Aethelgard addresses a measured problem: AI agent runtimes grant 15x more capabilities than tasks actually require. Rather than static permission sets, it uses reinforcement learning to discover minimum viable capability sets per task -- a four-layer governance framework that adapts permissions dynamically. The shift: from "what should this agent be allowed to do" to "what does this agent need for this specific task right now." (Paper)

The Darkhunt Take

Two weeks ago, the question was whether AI-accelerated offensive capability would outpace defensive capacity. Claude Mythos answered it.

181 Firefox exploits where the previous best produced two. Every major OS and browser. Vulnerability chaining without human guidance. This is the new baseline for what attackers will have access to. Anthropic chose to withhold Mythos, but the capability curve does not pause for responsible disclosure.

Mythos is the attention-grabbing story. The structural story is quieter and more important. Nine of 428 LLM API routers are actively malicious. Agent memory can be poisoned through passive observation. Flowise's MCP integration gives attackers CVSS 10.0 RCE on 12,000 exposed instances. Opening a repo in Cursor gives an attacker a persistent shell. These are not theoretical attack paths. They are happening now.

The pattern: agentic AI security is not a model-layer problem. The attack surface is the ecosystem -- routing, memory, tool protocols, development environments, supply chains. Microsoft's Agent Governance Toolkit and OpenAI's harness/sandbox architecture are directionally correct. They address structural risks rather than trying to make models "safer." But governance defines what agents should be allowed to do. It does not detect when something has gone wrong.

This period's research reinforces the gap. Goal reframing bypasses safety rules 38-40% of the time. Salami slicing chains safe prompts into dangerous outcomes at 90%+ success. Reasoning hijacking operates below the prompt layer entirely. These attacks do not violate policies -- they exploit the space between policy and behavior. Guardrails are monitoring tools, not security boundaries.

What is missing is the closed loop: systems that continuously test agent behavior, detect deviations, and adapt defenses as novel attack techniques emerge. The window between disclosure and exploitation has collapsed to days. The defense must operate at the same tempo.

Your AI agents have attack surfaces you have not tested. Find out what they are before someone else does.

Back to Digest

Know what your AI agent does before someone else does.

Try Darkhunt ->

Start free · Onboarding included

Know what your AI agent does before someone else does.

Try Darkhunt ->

Start free · Onboarding included

Know what your AI agent does before someone else does.

Try Darkhunt ->

Start free · Onboarding included