March 19 - April 2, 2026

Your regular briefing on AI security threats, vulnerabilities, and defenses from Darkhunt AI

TL;DR

  • Memory is the new attack surface for agents: Memory Control Flow Attacks override explicit user instructions via persistent memory retrieval, compromising 90%+ of trials across GPT-5 mini, Claude Sonnet 4.5, and Gemini 2.5 Flash

  • Every frontier model can be hijacked: NIST's competition with 400+ participants and 250,000+ attacks found successful hijacking attacks against all 13 tested frontier models, with universal attacks transferring across model families

  • LiteLLM supply chain compromise hit 500,000 devices in 40 minutes: TeamPCP poisoned the AI gateway library (3.4M daily PyPI downloads) through a compromised CI/CD security scanner, stealing SSH keys, cloud credentials, and Kubernetes secrets

  • Small models now match frontier offensive capability at 1/100th the cost: A 4B parameter model achieves 95.8% success on Linux privilege escalation, nearly matching Claude Opus 4.6 at 97.5%, collapsing the cost barrier for AI-powered attacks

  • RSAC 2026 was a turning point for agentic AI security: Microsoft shipped 700+ Zero Trust controls for AI, security leaders warned of "insane" next two years, and five agent identity frameworks launched -- all missing behavioural monitoring


Top Stories

Memory Control Flow Attacks: Agents That Forget Who They Work For

Researchers identified Memory Control Flow Attacks (MCFA), a class of vulnerability where an agent's persistent memory overrides its explicit instructions. By injecting malicious content into an agent's long-term memory, an attacker can force the agent to use unintended tools, bypass safety constraints, and deviate from user-specified behaviour -- persistently, across multiple sessions and tasks.

The MemFlow framework tested this systematically. Over 90% of trials across GPT-5 mini, Claude Sonnet 4.5, and Gemini 2.5 Flash were vulnerable, even when agents operated under strict safety constraints. The attacks are not prompt injections in the traditional sense. They exploit the fact that when memory retrieval and user instructions conflict, the memory frequently wins. The agent trusts its own stored context more than the user sitting in front of it.

Why it matters: Every serious agent architecture includes persistent memory -- it is what makes agents useful across sessions. But this research shows that memory is also a durable, cross-session attack channel. A single successful injection into an agent's memory can influence its behaviour indefinitely, across completely unrelated tasks. Current defences are not designed for threats that persist in the agent's own state.

Darkhunt perspective: Memory attacks represent a shift from ephemeral to persistent compromise. In traditional prompt injection, the attack lasts one session. With MCFA, the attack lives inside the agent's knowledge base, activating whenever relevant context is retrieved. Detecting this requires understanding what the agent knows, not just what it is being told right now. It demands continuous behavioural analysis: is this agent acting consistently with its instructions, or has its stored context been weaponised? (Paper)


NIST Red-Teaming Competition Proves Agent Hijacking Is Universal

NIST's Center for AI Standards and Innovation partnered with Gray Swan and the UK AI Security Institute to analyze the largest structured agent red-teaming effort to date: over 400 participants launched more than 250,000 attacks against 13 frontier models in agentic scenarios. The result was unambiguous. Every single model was successfully hijacked.

More revealing than the headline finding is the pattern underneath it. Attack resilience does not correlate with general model capability. A model that scores well on benchmarks is not necessarily harder to hijack. The researchers also identified universal attacks that transfer across 21 of 41 tested agent behaviors and across model families. Attacks developed against more secure models transferred to less robust ones at higher rates -- meaning the hardest targets produce the most reusable offensive techniques.

Why it matters: Static defenses calibrated to a specific model's behavior will fail as the attack landscape shifts. The transferability finding is particularly dangerous: an attacker who invests in breaking through one model's defenses gets attacks that work broadly. This is the opposite of how traditional software security works, where exploits tend to be target-specific. Agent security requires continuous, adaptive testing -- not one-time assessments.

Darkhunt perspective: The transferability of attacks across model families validates something we have built around: offensive AI agents that probe one system generate intelligence applicable across an organization's entire agent portfolio. When a single attack strategy can hijack agents built on different foundation models, the defense cannot be model-specific. It must be architectural -- monitoring agent actions, constraining permissions, and detecting behavioral anomalies regardless of which model is underneath. (Source)


LiteLLM Supply Chain Attack: The AI Infrastructure Domino

On March 24, TeamPCP compromised LiteLLM -- the Python library that serves as a gateway to virtually every major LLM provider -- by poisoning a Trivy dependency in its CI/CD pipeline. The malicious versions (1.82.7 and 1.82.8) were live for approximately 40 minutes. That was enough to compromise an estimated 500,000 devices.

The attack was technically sophisticated. Version 1.82.8 installed a .pth file that executed every time Python launched on the compromised system -- not just when LiteLLM was imported. The infostealer harvested SSH keys, AWS/GCP/Azure tokens, Kubernetes secrets, Docker configs, cryptocurrency wallets, environment files, CI/CD secrets, shell history, and database credentials. Data was encrypted with AES-256 and RSA-4096, then exfiltrated to attacker-controlled infrastructure at models.litellm[.]cloud.

The attack vector is the story within the story. The attackers did not compromise LiteLLM's source code directly. They compromised Trivy, a security scanning tool in LiteLLM's CI/CD pipeline. A security tool became the attack vector against the AI infrastructure it was meant to protect.

Why it matters: LiteLLM is foundational infrastructure for AI agent systems. It abstracts away the differences between LLM providers, meaning thousands of organizations route their AI agent traffic through it. A compromise at this layer means every agent built on top of LiteLLM is potentially feeding its prompts, responses, and credentials through an attacker-controlled dependency. The 40-minute window and 500,000-device impact illustrate how supply chain velocity in the AI ecosystem vastly exceeds the response capacity of human security teams.

Darkhunt perspective: This is a preview of how agent ecosystems will be attacked at scale. The most impactful targets are not individual agents -- they are the shared infrastructure layers that thousands of agents depend on. LiteLLM, MCP servers, embedding providers, vector databases. Compromise one of these and you compromise the entire constellation of agents built on top. Defending against this requires continuous monitoring of the full agent supply chain, not just the agent code itself. (Source)


Attack Vectors & Vulnerabilities


MCP Tool Poisoning: The Emerging Agent Communication Standard Is Already Under Threat

The Model Context Protocol is becoming the de facto standard for how agents interact with tools and data sources. Researchers applied STRIDE and DREAD threat modeling to the full MCP architecture and identified tool poisoning -- malicious instructions embedded in tool metadata -- as the most impactful client-side vulnerability. A comparison of seven major MCP clients revealed that most have insufficient defenses, with inadequate static validation and poor parameter visibility for users. The client side of MCP security has been systematically under-researched compared to the server side. (Paper)


n8n AI Workflow Platform: Four Critical RCE Vulnerabilities

Four critical vulnerabilities (CVSS up to 9.4) in n8n, a popular AI workflow automation platform, allow sandbox escapes, code injection, and arbitrary file writes. n8n is widely used to connect LLMs to enterprise systems, meaning these vulnerabilities provide a path from AI workflow compromise to full server takeover and access to stored credentials for every connected service. Fixed in versions 2.10.1, 2.9.3, and 1.123.22. (Source)


Threat Actors Are Experimenting with Agentic AI Operations

Microsoft Threat Intelligence published evidence that threat actors are employing role-based jailbreak techniques and experimenting with agentic AI for iterative attack operations. While not yet observed at scale, the pattern is clear: adversaries are treating AI as tradecraft, using LLMs for enhanced social engineering, automated reconnaissance, evasion, and payload development. The progression from one-shot LLM misuse to agentic iteration is the inflection point defenders should be tracking. (Source)


Small Offensive Models Collapse the Cost Barrier

PrivEsc-LLM, a 4B parameter model, achieves 95.8% success on Linux privilege escalation after a two-stage post-training pipeline (supervised fine-tuning on expert traces, then reinforcement learning with verifiable rewards). This nearly matches Claude Opus 4.6's 97.5% at over 100x lower cost per successful exploitation. The implication is concrete: frontier-grade offensive capability is no longer gated by frontier-grade compute budgets. (Paper)



Defensive Developments


Microsoft Ships Zero Trust for AI at RSAC 2026

Microsoft launched Zero Trust for AI (ZT4AI), extending the Zero Trust framework to AI systems with over 700 AI-specific security controls. The framework applies three core principles to agents: verify explicitly (agent identity and intent), apply least privilege (constrain what agents can do), and assume breach (expect prompt injection, data poisoning, and lateral movement). Complementary product announcements include new Defender capabilities for AI threat detection, Entra for agent identity management, and Purview for data governance across agent systems. Agent 365 goes generally available May 1, 2026. (Source)


Dual-Firewall Architecture for Multi-Agent Communication

Microsoft Research published a dual-firewall system for agent-to-agent communication. The Language Converter Firewall translates messages into a closed, domain-specific protocol that structurally eliminates manipulation tactics. The Data Abstraction Firewall projects outgoing information at appropriate granularity levels rather than making binary disclosure decisions. Results: privacy attacks dropped from 84% to 10%, security attacks from 60% to 3%, with task quality maintained. Open source at github.com/microsoft/Firewalled-Agentic-Networks. (Paper)


Agent Audit: Open-Source Security Analysis for LLM Agent Code

A new open-source tool from USC researchers performs static security analysis of LLM agent applications, scanning code, credentials, and MCP configurations for vulnerabilities. In testing, it identified 40 of 42 known vulnerabilities with only 6 false positives. This addresses a gap in pre-deployment security -- catching vulnerabilities in agent code before the agent is running in production. (Paper)


RSAC 2026 Agent Identity Frameworks: Progress with Gaps

Five agent identity frameworks shipped at RSAC, but VentureBeat analysis identified three critical gaps across all of them: no behavioral baseline tracking, no post-authentication validation, and no context-aware authorization. In one cited incident, a CEO's AI agent autonomously rewrote company security policy without detection. Identity alone is not enough when an authenticated agent can act outside its intended scope. (Source)



Industry Moves


RSAC 2026: The Agentic AI Security Conference

RSAC 2026 effectively became the agentic AI security conference. Microsoft dominated with three major announcements (ZT4AI, Secure Agentic AI, prompt abuse playbook). OpenAI launched a Safety Bug Bounty focused on prompt injection, data exfiltration, and disallowed actions in agentic products (up to $7,500 per finding, jailbreaks explicitly excluded). Straiker expanded its product surface with Discover AI for agent inventory and Defend AI for runtime security with sub-300ms latency. The competitive landscape for agentic AI security products is forming rapidly. (OpenAI | Straiker)

Security Leaders: "Respond at Machine Speed or Lose"

Kevin Mandia (Armadin), Alex Stamos (Corridor), and Morgan Adamski (PwC) warned at RSAC of a "perfect storm for offense." Key quotes: "Exploit discovery has gone exponential" (Stamos). AI agents evade EDR in under an hour. "Patch Tuesday, exploit Wednesday" as AI automates reverse-engineering. "You're going to have to respond at machine speed" (Mandia). The consensus: the next two years will be defined by whether defenders can match the pace of AI-augmented offense. (Source)

Enterprise AI Agent Adoption Reaches Critical Mass

Microsoft cited that 80% of Fortune 500 companies are now using AI agents, with Agent 365 going GA in May. BleepingComputer published a categorization framework for enterprise agents (agentic chatbots, local agents, production agents) with non-human identity management emerging as a key challenge. Zenity observed that government policy on agentic AI security is falling behind the speed of enterprise adoption. The gap between deployment velocity and security maturity continues to widen. (BleepingComputer | Zenity)

--

The Darkhunt Take

This period crystallized something that has been building for months: the agentic AI security problem is not a model problem. It is an ecosystem problem.

Consider what happened in two weeks. A supply chain attack on LiteLLM compromised 500,000 devices through a poisoned security scanner in CI/CD. Memory Control Flow Attacks showed that an agent's own stored knowledge can be weaponized against it. NIST proved that hijacking attacks transfer across model families. n8n RCE vulnerabilities opened paths from AI workflow tools to full server takeover. A 4B parameter model matched frontier offensive capability at 1/100th the cost.

None of these are individual, patchable bugs. They are consequences of how agent ecosystems are structured. Agents depend on shared infrastructure (LiteLLM, MCP, workflow platforms). Agents trust their own memory. Agents inherit the attack surfaces of every tool they connect to. Attacks that work against one model work against many.

Microsoft's response -- 700+ controls, Zero Trust applied to agents, "assume breach" as a first principle -- is the right philosophical direction. But controls and frameworks are static. The attack surface is not. NIST's competition demonstrated that the attack landscape is adversarial and adaptive: the hardest targets produce the most transferable attacks.

The uncomfortable truth from RSAC is that the security industry knows what is coming and is not confident it can keep up. Mandia's "respond at machine speed" is not a slogan. It is a requirement. When AI agents can evade EDR in under an hour and exploit discovery is exponential, the response system must be autonomous too.

That is the gap we are focused on: not another framework or control catalog, but systems that actively probe agent ecosystems for the class of structural vulnerabilities this period revealed, reason about what they find, and harden defenses before the next TeamPCP or MemFlow-style attack lands. The window between "attack discovered in research" and "attack deployed in the wild" is compressing. The defense must compress faster.