AI Security Digest: May 14 - May 28, 2026

TL;DR
Autonomous vulnerability discovery crossed the production line. Anthropic's Project Glasswing (Mythos Preview), deployed to ~50 partners, surfaced more than 10,000 high- and critical-severity vulnerabilities in a single month — Cloudflare alone reported 2,000 bugs (400 critical), Mozilla found 271 in Firefox 150 against a Firefox 148 / Claude Opus 4.6 baseline. wolfSSL CVE-2026-5194 is a real exploit constructed by the model. The bottleneck has moved from discovery to patching.
Misconfiguration is still the dominant agent attack surface. Microsoft Defender telemetry pegs >50% of cloud-native workload exploitations at misconfiguration; 15% of remote MCP servers ship severely insecure, and Mage AI, kagent (CNCF), and AutoGen Studio all ship without authentication. The "1990s mistakes on 2026 substrate" pattern has empirical ground truth from a hyperscaler.
Academia is settling on a single thesis: don't trust the model, enforce invariants at the system level. Three independent arXiv papers this fortnight — Christodorescu/Fernandes/Rehberger (2605.18991), Holz/Rieck (2605.14932), and Abdelnabi/Bagdasarian (2605.17634) — converge on the same systems-security framing for agents. Abdelnabi proves an impossibility result: data-instruction separation cannot, in principle, defeat prompt injection.
Memory is the next defensive frontier. MemMorph hijacks tool selection in LLM agents with just three poisoned records, achieving 85.9% attack success and bypassing three representative defenses across 10 agent architectures and 3 memory backends.
AI developers are now a tier-1 target. A live infostealer campaign uses 88 typosquatted domains and 30+ fake Claude Code / JetBrains / NotebookLM install pages, with post-quantum encryption (ML-KEM-768), raw-socket EDR bypass, and a Binance Smart Chain smart-contract C2 channel that no takedown can reach. 175+ wallet extensions and AI coding tools (Cline, Continue.dev) are explicitly targeted.
Top Stories
Project Glasswing: Autonomous Vulnerability Research Has Shipped
Anthropic's first operational update on Project Glasswing is the most important data point this fortnight, and arguably the most important since the start of the year. Mythos Preview — Glasswing's gated frontier vulnerability-research system — has been running with roughly 50 trusted partners. In one month it surfaced 10,000+ high- and critical-severity vulnerabilities. Cloudflare reported 2,000 bugs of which 400 were critical. Mozilla found 271 vulnerabilities in Firefox 150 — more than ten times what Firefox 148 surfaced with Claude Opus 4.6. Across 1,000+ scanned OSS projects, Anthropic estimates ~6,202 critical findings with 90.6% validated and 62.4% confirmed critical. Palo Alto Networks reported a 5x increase in patch volume.
The CVE that headlines the disclosure — CVE-2026-5194 in wolfSSL — is not a triage finding. It is a real exploit chain the model constructed end-to-end.
Why it matters: This is the first credible operational evidence that frontier-grade autonomous vulnerability research has crossed the productionization line. The headline number is 10,000+ critical bugs in a month. The more important number is the maintainer fatigue stat: the median time-to-fix for critical vulns from partners is now two weeks, and that timeline is being set by humans, not models. The bottleneck has shifted from "can we find the bug" to "can we patch it before someone else weaponizes it."
The Darkhunt angle: Closing that loop is the entire thesis. Discovery without remediation is now an attacker subsidy — every public vuln-discovery system that runs faster than the patching pipeline lengthens the window in which adversaries can pull from the same surface. This is exactly why static reporting is insufficient and why the defense side has to operate as a reasoning system that takes a finding, models exposure, and applies a runtime mitigation before the upstream patch lands. The frontier labs are showing us what offense looks like at industrial scale. Defense has to converge to the same scale, or concede the asymmetry permanently.
Two Microsoft Disclosures, One Story: Configuration Is Still Eating Agent Security
Microsoft published two pieces that read as a single argument. The first, "When configuration becomes a vulnerability", reports that more than half of all cloud-native workload exploitations Defender for Cloud sees start with a misconfiguration. Drilling into AI specifically: 15% of remote MCP servers Microsoft scanned are severely insecure with unauthenticated access. Mage AI, kagent (a CNCF project), and AutoGen Studio all ship without authentication. Ray Dashboard, Comfy UI, and Marimo are similarly exposed. Mage AI's default Kubernetes deployment is being actively exploited in the wild.
The second, "Introducing RAMPART and Clarity", is the prescriptive response: two open-source tools that put agent red teaming inside CI pipelines as first-class test types. RAMPART is a PyRIT-based pytest framework for cross-prompt-injection testing with probabilistic policies — "safe in 80% of runs" matches real LLM behavior better than binary pass/fail. Clarity enforces structured pre-implementation problem clarification with version-controlled .clarity-protocol/ artifacts so red-team incidents become permanent CI tests.
Why it matters: Read together, these two posts are Microsoft's diagnosis-and-prescription for agent security. The diagnosis is that the protocol layer (MCP) has no enforced authorization, the framework layer (Mage, AutoGen, kagent) ships insecure by default, and the operator layer is being exploited right now. The prescription is that red teaming has to move from periodic audit into the build pipeline, with the same continuity and the same blast-radius semantics as unit tests.
The Darkhunt angle: This is structurally identical to where we landed last cycle on "configuration is the new code path." A hyperscaler is now publishing the same conclusion with telemetry behind it. The continuous-red-team thesis — that the only sustainable defense is one that re-probes the system every time the system changes — just got endorsed by the largest enterprise security vendor on the planet. The question is no longer whether continuous adversarial testing belongs in CI. It is whether the testing harness can reason about its own findings, or whether it just files them.
Theory Catches Up to Practice: Three Papers, One Conclusion
Three independent arXiv preprints this fortnight converge on the same answer for agent security, and the convergence is more interesting than any single paper.
"AI Agents May Always Fall for Prompt Injections" (Abdelnabi and Bagdasarian, 2605.17634) reframes prompt injection through Contextual Integrity theory and proves an impossibility result: for any defender norm, an adversary can construct a context under which a blocked flow appears legitimate, or tightening the norm blocks legitimate flows. Data-instruction separation is, formally, insufficient. The recommendation is CI-aware alignment, not better filters.
"Agent Security is a Systems Problem" (Christodorescu, Fernandes, Hooda, Jha, Rehberger, Chaudhuri et al., 2605.18991) analyzes 11 real-world agent attacks and shows for each how a systems-level mitigation — reference monitors, least privilege, mediated execution — would have prevented it. The argument: treat the model as an untrusted component, enforce invariants outside it.
"Toward Securing AI Agents Like Operating Systems" (Pirch, Horlboge, Großmann, Asif, Kireev, Holz, Rieck, 2605.14932) tests four widely-used agents under modest attacker models, finds that current protections fail, and argues OS primitives (resource isolation, privilege separation, mediation) apply directly.
Why it matters: Three groups with no overlapping authors, independently, in two weeks, arrived at the same architectural conclusion: the model is not the security boundary. The system is. This is the canonical position paper set for the systems-security framing of agents, and it lines up exactly with what the frontier-lab convergence and Five Eyes guidance hinted at last cycle.
The Darkhunt angle: Cite these three papers in every conversation about why static guardrails do not work. Abdelnabi's impossibility result, in particular, is the formal answer to anyone still pitching a prompt-injection filter as a security control. The defensive paradigm is moving from "make the model safe" to "build a control plane around an unsafe model" — which has been our position since day one. The Christodorescu paper's 11-attack postmortem is the cleanest existing argument for runtime mediation as a first-class architectural primitive.
Attack Vectors & Vulnerabilities
Memory Poisoning Is Real, Reproducible, and Defeats Current Defenses
MemMorph (Zhang et al., 2605.26154) is the first strong empirical paper to weaponize long-term agent memory as a tool-hijacking vector. The mechanism: inject as few as three crafted records disguised as docs or policies into the agent's memory store. The model later retrieves them, treats them as authoritative context, and selects attacker-chosen tools. Tested across three benchmarks, 10 agent architectures, and three memory backends. Achieves 85.9% attack success. Bypasses three representative defenses.
Three records. 85.9%. This is not a marginal finding. It maps the next defensive frontier: memory integrity is now an unsolved problem, and it matters more the longer your agents run.
A Live Infostealer Campaign Targeting AI Developers
Independent threat research published this cycle documents a multi-stage credential-stealing campaign using 88 domains and 30+ typosquatted install pages impersonating Claude Code, JetBrains tooling, and NotebookLM. The technical chain is professional-tier: a one-character "&" trick that runs the malicious command in the foreground while the legitimate curl fires harmlessly in the background; ML-KEM-768 post-quantum encryption on the C2 channel; raw-socket EDR bypass; a Binance Smart Chain smart contract that distributes attacker addresses (no domain to seize); and credential-theft modules explicitly targeting Cline, Continue.dev, Snowflake, 175+ wallet extensions, and 65+ browsers.
Why it matters: This is the first widely-documented infostealer with explicit Cline / Continue.dev / Perplexity targeting. AI developers themselves are the high-value target now, because their credentials chain into private model APIs, code repositories, and CI pipelines. The smart-contract C2 means takedown is structurally impossible — you can blocklist addresses but you cannot remove a deployed contract. The "&" trick is a templatable supply-chain technique that other actors will copy within the next cycle.
Repository-Controlled AGENTS.md as a Code Execution Sink
Vulnerability research disclosed this cycle showed that the Kilo AI CLI treats repository-controlled AGENTS.md files as authoritative directives, allowing an attacker who controls a repo to embed instructions that the agent then executes via its built-in execute_command tool. Even a benign-looking "hi" can trigger attacker shell commands. This is the same instruction-provenance failure pattern as last cycle's .cursor/rules-class issues and trust-persistence findings: configuration files in agent CLIs are now an execution surface, and provenance is the missing primitive.
Search Poisoning Now Targets AI Chatbots Directly
Microsoft documented a cryptojacking campaign where threat actors poisoned the corpus AI chatbots retrieve from, causing the chatbots to recommend attacker-controlled download domains. This is the first wide-scale operational case of "AI chatbot output as a malware distribution channel" — and the obvious successor to traditional SEO poisoning. BleepingComputer's mainstream coverage confirms the pattern: defenders cannot rely on the model to gate the answer; they have to defend the retrieval substrate.
Multimodal Jailbreaks at 97-99% Success
A vendor red-teaming trends report cites multimodal attack success rates against current frontier models at 97-99%. The arXiv source (2506.14682) also reports autonomous attacks running 5,000x faster than human-driven equivalents. The headline framing — beyond prompt injection, continuous over point-in-time, multimodal, agentic lateral movement, CI/CD integration — is consistent with what every serious offensive research team is publishing.
Defensive Developments
RAMPART and Clarity (Microsoft, Open Source)
Covered above as a top story. The detail worth re-emphasizing for defenders: probabilistic policies. "Safe in 80% of runs" is the correct semantics for an LLM-backed system because the underlying model is non-deterministic. Binary pass/fail collapses the wrong dimension. Any team building agent CI should look at RAMPART before rolling their own. GitHub releases here.
Fine-Grained Identity for Agents
Telemetry from MCP-governance vendors continues to show that traditional IAM, OAuth, and RBAC primitives do not map onto non-deterministic, delegating agents. Shadow MCP servers and over-provisioned agent identities are the dominant new internal attack surface inside enterprises. The buyer-side conversation is now converging on agent-identity-as-first-class — distinct from human and service-account identity, with scoped credentials, audit trails, and revocation primitives that assume the agent itself can be compromised mid-session.
The "AI Trust Tax" Has a Number
A vendor analysis of agent threat models attempted to quantify the cost of external evaluation calls at agent handoffs: roughly $260,000 per year at 500K traces/day. Whether or not the specific number generalizes, the framing matters — every guardrail call between agent hops is a budget line item, and inline evaluation is the obvious architectural response.
Handoff Tracing Standards Are Emerging
Industry analysis of multi-agent tracing has produced a credible five-element handoff schema (W3C trace context, handoff payload schema, decision metadata, context diff, guardrail state) and named four critical failure modes — silent context truncation and guardrail-state non-propagation being the two most operationally dangerous. OpenTelemetry alone is insufficient: it captures infrastructure telemetry, not agent-specific semantic signals like reasoning chains and tool-call decisions.
Research & Papers
FuzzingBrain V2
2605.21779 (Sheng, Chen, Xu, Zhu, Huang) is the open academic counterpart to Glasswing's closed system. A multi-agent vulnerability-discovery framework built on OSS-Fuzz with a "Suspicious Point" control-flow abstraction and MCP-based reasoning tools. 90% detection rate (36/40) on a competitive dataset; 29 zero-days across 12 OSS projects; 2 CVEs assigned. Multi-agent vulnerability discovery is now the dominant research pattern of mid-2026, and the gap between academic systems and frontier-lab systems has narrowed considerably.
Reframing Agent Security as Agent-Human Interaction
2605.24309 (Wang, Li, Tian) surveys 59 academic papers, 21 production systems, and 26 security plugins to map an industry-academia gap. The mechanisms academia is researching (intent anchoring, trust labeling) have zero production deployment. The mechanisms dominating production (policy specification, runtime approval, scope configuration) are not what academia is publishing. The central UX-security problem is approval-fatigue-vs-autonomy, and nobody has solved it.
Anthropic Policy: 2028 Scenarios
Anthropic's policy paper on 2028 AI leadership scenarios is mostly a US-leadership argument, but two security-relevant claims are worth flagging: frontier models can now autonomously "discover and chain software vulnerabilities" (Glasswing is the existence proof), and open-weight authoritarian models could enable automated repression at scale. Read this paper alongside the Glasswing update.
Industry Moves
NIST's Cyber AI Profile Lands This Summer
NIST is signaling a summer 2026 release for the Cyber AI Profile, with control overlays for predictive AI in summer and agentic systems in late summer / early fall. Full finalization is targeted for 2027. Regulated-industry buyers will start asking for NIST Cyber AI Profile alignment by Q4. Anyone in the agent-security control-plane category should map their controls to the Profile's overlays now, not after the final.
Anthropic Compliance API: 28 Integrations Out of the Gate
Anthropic's Claude Compliance API ships with 28 enterprise integrations including Cloudflare, CrowdStrike, Datadog, Microsoft Purview, Netskope, Okta, Palo Alto, and Fortinet — covering DLP, SASE, SIEM, IAM, eDiscovery, AI-SPM, and observability. The signal is that the frontier labs see compliance integration as a sales-cycle prerequisite, not an afterthought.
Identity Layer Attacks Are AI-Agent Attacks
Unit 42's "Paved With Intent" covers a nation-state campaign (attributed to Curious Serpens) using ROADtools against Entra ID and Azure. AI agents share this exact identity layer — agent permissions in Microsoft 365, SharePoint, and Azure inherit the same attack surface that human accounts do. Anyone deploying agents on Entra needs to read this and treat it as part of their agent threat model, not as a separate "cloud security" concern.
The CVE Treadmill Is Faster Than the Patch Treadmill
Proofpoint's 2026 vulnerability-exploitation report reinforces the operational ground truth: AI agent infrastructure (Flowise, Semantic Kernel, K8s) will get hit by the same recurring playbook that hits everything else. The interesting addition this cycle: four of the top exploited CVEs in 2026 were exploited before CISA's KEV catalog flagged them. Vulnerability intelligence as a defensive primitive is degrading.
The Darkhunt Take
This fortnight, three things happened at once.
First, the frontier labs proved that autonomous vulnerability research at industrial scale is now a deployed capability, not a research demo. Glasswing's 10,000-bugs-in-a-month and Palo Alto's 5x patch-volume number describe a world where the gap between "vulnerability exists" and "vulnerability is known to a competent attacker" has effectively closed. The bottleneck is patching, and patching is a human-speed activity.
Second, academia formally converged on the answer that practitioners have been groping toward for two years: the model is not the security boundary, the system is. Abdelnabi's impossibility result for prompt injection. Christodorescu's 11-attack systems-security postmortem. Holz/Rieck on OS primitives. The papers do not agree on every detail, but they agree on the shape: build a control plane around an untrusted model, enforce invariants externally, treat the model the way an OS treats an untrusted process.
Third, the operational evidence kept stacking up that the offense side is moving faster than the defense side. MemMorph hijacks tool selection with three records. A live infostealer campaign targets AI developers with smart-contract C2 that no takedown can reach. Search poisoning weaponizes AI chatbot output as a malware distribution channel. Repository-controlled AGENTS.md joins the trust-persistence and configuration-as-execution-surface findings from last cycle — these are not separate bug classes, they are the same architectural failure showing up in different products.
The thread that runs through all of this is the one we have been pulling at for the last several cycles. Static defenses cannot keep up with a threat model that mutates every time the underlying system changes. Approval gates fail because trust is temporal. Refusal layers fail because the refusal layer is itself an input the attacker can shape. Memory integrity fails because nobody has built it yet. The frontier labs are now publishing operational data showing that autonomous offense scales; the academic consensus is now telling us to treat the model as untrusted and build invariants outside it; the in-the-wild campaigns are now showing us what motivated, well-resourced adversaries actually do with this surface.
The defense side has to converge to the same tempo. That means probing your own agent systems continuously, reasoning about what changed since the last probe, and converting findings into runtime policy faster than the attacker can convert disclosures into payloads. The closed-loop posture is no longer a thesis. It is the only posture that operates at the speed at which the threat is now moving.
Everything else is a snapshot.
Darkhunt AI builds autonomous systems that probe, reason, and harden AI defenses. Learn more