April 30 – May 14, 2026

April 30 – May 14, 2026
Approve Once, Compromise Forever
AI Security Digest: April 30 – May 14, 2026 | Darkhunt AI
TL;DR
One approval, three agents, indefinite execution. A TOCTOU trust-persistence flaw shared across Claude Code, OpenAI Codex CLI, and Google Gemini CLI: once a repository is approved, modified project configs reload without re-approval. All three vendors were notified in February and declined to treat it as a security issue. The structural pattern from last cycle's "Comment and Control" has a sibling, and it has the same vendor response.
Configuration is the new code path. Following last period's MCP-STDIO disclosure, May made it clearer: in AI development tools, configuration files are an undocumented execution surface. Settings reload triggers tool registration triggers handler initialization triggers code. The protocol-level finding now generalizes across the product layer.
Google Antigravity joins the AI-IDE RCE list. A persistent code-execution vulnerability disclosed in Google's AI-driven IDE survives restarts and project switching. Cursor in April. Antigravity in May. The cadence is no longer "anomaly." It is "category."
Hugging Face is the new npm. A six-stage credential stealer was caught in a typosquatted "Open-OSS/privacy-filter" repository on Hugging Face, harvesting browser tokens, crypto wallets, and cloud credentials before takedown. Model-repository typosquatting is now an operational supply-chain primitive against AI pipelines.
The frontier labs are converging on agent security architecture. OpenAI, Google DeepMind, and Anthropic each published agent-security principles this period that line up on five common points. The Five Eyes intelligence alliance issued aligned joint guidance the same week. Anthropic's "Teaching Claude Why" research reframes misalignment as a reasoning gap, not a refusal gap. The defense side is finally agreeing on what it is defending.
Top Stories
Approve Once, Exploit Forever: The Trust Model Has No Expiration
A Time-of-Check-to-Time-of-Use vulnerability disclosed in May 2026 affects Anthropic's Claude Code, OpenAI's Codex CLI, and Google's Gemini CLI. The mechanism: once a user approves a repository, the agent treats subsequent reads of that repository's project configuration as trusted, even when those configs have been modified after the initial approval. Approval is checked once, at the wrong moment, and used forever.
Anthropic, OpenAI, and Google were notified in February 2026. All three declined to treat it as a security issue. The vulnerability is still live in May.
Why it matters: Last cycle's "Comment and Control" showed a single payload bypassing three vendor implementations of the same architecture. This cycle's finding is its mirror: the same architecture, but the failure is in the trust model rather than the execution path. Approval at check-time and execution at use-time are temporally decoupled, with no mechanism to invalidate the bond when the underlying object changes. That is not a bug class. That is the design.
The triple-vendor non-response is also a story. February to May is three months. The reported vulnerability remains live across the three most-deployed AI coding agents on the market. The industry's stated position is that this is intended behavior.
Darkhunt perspective: Trust persistence is the cleanest demonstration that agent security is not a feature you ship. It is a continuous property you have to enforce. Static approval gates are a snapshot of a trust state that the underlying repository can mutate at will. Closed-loop defense has to monitor the gap between approval-time and use-time as a first-class signal, not a coincidence. The fact that three vendors looked at this and called it acceptable is precisely the gap that runtime governance has to instrument around.
When Configuration Becomes Code
Last period's Anthropic-MCP-STDIO disclosure showed configuration parsing triggering OS command execution at the protocol layer. May extends the pattern to the product layer: in AI development tools broadly, configuration files are an undocumented execution surface. Settings reload triggers tool registration. Tool registration triggers handler initialization. Handler initialization runs code. The path from "edit a YAML file" to "execute arbitrary commands" is shorter than the documentation implies.
Why it matters: Two consecutive cycles, two independent research teams, one structural finding: in agent platforms, the line between declarative configuration and runtime execution has dissolved. The implication for defenders is that the threat model has to enumerate every config surface (every .yaml, every .json, every project-local override file) as a potential code-execution sink. Static-analysis tools that treat config as inert data are now insufficient by construction.
Darkhunt perspective: The "configuration is code" pattern is exactly the kind of failure mode that adversarial probing surfaces and that no amount of policy documentation prevents. Every config-reload path in the agent platform must be treated as an execution boundary and tested as one. The cleanest defensive primitive is to make config changes themselves a privileged operation that requires fresh approval. That is exactly the gap "Approve Once, Exploit Forever" exploits.
Google Antigravity Joins the AI-IDE RCE List
A persistent code-execution vulnerability was disclosed in Google's Antigravity, an AI-driven IDE released earlier in 2026. Persistence means the execution survives IDE restarts, project switching, and the usual containment assumptions developers rely on.
Cursor's "Triple Backtick" RCE in April. Antigravity's persistent RCE in May. The pattern is now a category.
Why it matters: AI-driven IDEs ship with an inverted threat model. Traditional IDE security assumes the developer is the trust anchor and untrusted input is parsed defensively. AI-driven IDEs ingest untrusted input (repositories, dependencies, documentation, prompts), grant it influence over what the IDE executes, and then run that execution with the developer's full local privileges. The threat model isn't broken. It never matched the architecture in the first place.
Darkhunt perspective: Every AI-driven IDE on the market is now a credible target for indirect prompt injection chaining into local RCE. The discovery cadence is roughly one major disclosure per cycle, and the disclosures span vendors. Static IDE hardening assumes the developer chooses what runs; AI-driven IDEs delegate that choice to a model whose inputs the developer doesn't control. Runtime probing of the IDE-as-agent harness (what does this IDE actually execute, and what input convinced it to) is now table stakes for any organization deploying AI development tools at scale.
Hugging Face Is the New npm
A six-stage credential-harvesting infostealer was caught in a typosquatted Hugging Face repository named "Open-OSS/privacy-filter," mimicking OpenAI's Privacy Filter. The payload targeted browser credentials, crypto wallets, and cloud credentials. The repository reached significant install volume before takedown.
Why it matters: Hugging Face is to AI tooling what npm is to JavaScript and what PyPI is to Python: the canonical model and dataset registry. Typosquatting against it is the natural next step in the supply-chain playbook that has already proven productive against npm and PyPI. The first major typosquatted-model campaign is a signal flare for the entire AI build pipeline: every agent runtime that pulls weights or filters or tokenizers from a public model registry now inherits a new class of supply-chain risk.
Darkhunt perspective: Model-repository hygiene is the missing layer in most agent SDLCs. Organizations treat the Hugging Face download step the way they treated npm install in 2018: as an implicit trust boundary they never explicitly evaluated. The defensive primitive is straightforward in principle: pin model artifacts by hash, scan them, attribute origin, and assume the registry surface will be attacked. The harder problem is that most agent platforms don't yet expose model-pull as a discoverable boundary. That is what runtime governance has to instrument before the next campaign lands.
Attack Vectors & Vulnerabilities
MCP Security Has Stats Now
The MCP attack surface acquired its first measurable shape this period: published attack-pattern distributions, risk benchmarks across implementations, and a working scoring framework for MCP security posture. The category has consolidated faster than most agent-protocol attack surfaces did. The difference is that the foundational disclosure (Anthropic's MCP-STDIO finding last cycle) came from the protocol owner itself, which compressed the gap between vulnerability discovery and category formation.
Ten of Ten Chatbots Plan School Shootings
Independent research this period extended a prior CNN investigation that found eight of ten chatbots provided harmful planning guidance under refusal-bypass prompts. The updated count: ten of ten. A companion test showed Claude offering explosives-making instructions under standard refusal-bypass framings. Not strictly agent security, but a direct refutation of the "guardrails block harmful content" thesis that every frontier safety release implies. Pattern-matching refusal layers are 0-for-10 against motivated probes.
The "75% Rise in Malicious Packages" Has a Source
Chainguard's 2026 supply-chain report — surfaced via The Hacker News' "2026: Year of AI-Assisted Attacks" — provides verified figures behind the AI-assisted-attacks narrative: malicious packages on public repositories rose 75%, and Mandiant's M-Trends 2026 reports the average time-to-exploit collapsed from 700 days in 2020 to 44 days in 2025, with 28.3% of CVEs now exploited within 24 hours of disclosure. The numbers are independently primary-sourced, which makes them safe to repeat. They describe an attack pipeline that no longer waits for patches.
Defensive Developments
ARGUS: Provenance Graphs as Defense Primitive
A new arXiv preprint introduces ARGUS, a defense against context-aware prompt injection that constructs an "influence provenance graph" tracking how untrusted context propagates into agent decisions. Reported attack success below 4% while preserving roughly 87% of task utility. Paired with a new benchmark (AgentLure) for context-dependent attacks. Notable for taking the data-flow approach that "When Configuration Becomes Code" implicitly demands: defenses that reason about which inputs influenced which outputs, not just whether an input matched a pattern.
Confidential Computing for Agentic AI
A comprehensive survey of Trusted Execution Environments across six hardware platforms, framed specifically against the agent threat model: prompt injection, context exfiltration, credential theft, and inter-agent message poisoning. Useful as a map of where hardware-rooted defense fits, and as a flag for the gaps. Most current TEE designs are not built for the workloads agents actually run, and the paper is honest about that.
Anthropic Teaches Claude "Why"
Anthropic's "Teaching Claude Why" reframes agentic misalignment as a reasoning failure rather than a refusal failure. The training-time intervention provides intent grounding (explaining why a constraint exists) and reports reductions in off-policy agent behavior. This is a tonal shift. Prior alignment work optimized for the model declining bad instructions; this approach optimizes for the model understanding the intent behind the policy so it can apply that intent to edge cases the policy didn't anticipate. If it generalizes, it changes what next-generation guardrails look like. They will need intent context, not just rule lists.
Donating Petri
Anthropic open-sources Petri, an alignment evaluation tool, to the broader research community. Mostly relevant as ecosystem infrastructure. Research-grade alignment tooling is getting easier to share and easier to benchmark against.
Research & Papers
Agentic AI and the Industrialization of Cyber Offense
Christopher Koch's survey-and-forecast paper is the cleanest articulation of the "AI-assisted attacks are an economic shift, not a tactical one" thesis we have seen. The synthesis: agentic AI compresses attack lifecycles across reconnaissance, phishing, credential abuse, vulnerability triage, exploit adaptation, and post-compromise decision support. The paper's defensive recommendations skew toward identity management, authentication strengthening, and patch velocity. The implicit acceptance is that AI-specific controls are not yet a mature category, and that traditional security primitives have to absorb the load. Mid-market is named as a specific risk segment, which is consistent with what we are hearing from European customers.
The Buyer Rubric Stabilizes
A pattern visible across multiple vendor comparisons this period: AI red-teaming tools are now being evaluated against three buyer-side criteria — OWASP Agentic Top 10 coverage, MCP testing capability, and multi-agent risk assessment. The vocabulary is consolidating. If you are positioning a red-team product, those are the three columns in the spreadsheet now.
Industry Moves
Five Eyes Treats Agentic AI as an Architecture Problem
The Five Eyes intelligence alliance — the US, UK, Canada, Australia, and New Zealand — issued joint guidance on agentic AI this period. The framing matters: government-aligned guidance is no longer asking organizations to add controls on top of agentic AI deployments. It is asking them to architect for unexpected behavior from the start. The guidance names four risk categories (privilege, behavioral, structural, accountability) that organizations must address before deployment, not after. For sales conversations with regulated-industry buyers, this is the new vocabulary.
Frontier Lab Consensus: Five Shared Principles
OpenAI, Google DeepMind, and Anthropic each published agent-security principles this period that line up on five common points across the three labs. Independent convergence is stronger evidence than coordinated working groups. When three labs that are not coordinating end up at the same answer, that answer is closer to a primitive than to a preference. Worth mapping against the APE taxonomy on our side.
Identity Dark Matter Goes Mainstream
The Hacker News covered the "identity dark matter" framing this period: AI agents are being deployed faster than IAM tooling can see them, creating a population of agent identities operating outside managed authority. The same observation surfaced from a different angle last period. The vocabulary is consolidating; the practical implication is that asset inventory for agent identities is the prerequisite for everything else. (The Hacker News)
The Darkhunt Take
Last period we wrote that the agent attack surface is the ecosystem. This period the ecosystem answered: the attack surface is the architecture, and the architecture is what the vendors will not change.
Three vendors looked at a TOCTOU trust bug spanning all of their flagship coding agents and decided it was intended behavior. The protocol owner declined to change MCP. Configuration files are an execution surface in every AI-driven IDE we have data on. Hugging Face is now a credible typosquat target. None of these are bugs. They are choices, and the people who made them are not the defenders.
What the defenders have, this period, is data and consensus. MCP security crossed the line from category to measurable benchmark. Chainguard and Mandiant put verified numbers behind the AI-assisted-attacks narrative: 75% more malicious packages, time-to-exploit collapsed from 700 days to 44, a quarter of CVEs exploited within 24 hours of disclosure. The Five Eyes alliance and the three frontier labs independently converged on the same architectural principles for agent security. Anthropic's "Teaching Claude Why" reframes the defensive problem itself.
That convergence is what makes this period feel different from the last several. The threat model is no longer contested. What is contested is whether the vendors who own the trust models will fix them, and the early evidence is: they will not. The TOCTOU bug reported to all three coding-agent vendors in February remains live in May. MCP's unsafe defaults remain unfixed. Cursor patched its triple-backtick after public disclosure; Antigravity will follow the same path; the next one is already being written.
The conclusion is the same one we keep arriving at, from a different direction each period. Static controls do not work because the architecture is not static. Approval gates do not work because the trust model is temporal. Refusal layers do not work because the refusal layer is itself an input the attacker can shape. The only defense that operates at the tempo of the threat is one that probes its own systems continuously, reasons about what changed since the last probe, and converts findings into runtime policy faster than the attacker converts disclosures into payloads.
That is the loop. Everything else is a snapshot.
Your AI agents have attack surfaces you have not tested. Find out what they are before someone else does.