10,000 Bugs Hidden in Plain Sight. One AI Just Found Them All.

A vulnerability in OpenBSD — widely regarded as one of the most security-hardened operating systems ever built — had survived 27 years of expert review. A flaw in FFmpeg, a video-processing library embedded in countless applications, had been hit by automated testing tools five million times without detection. A chain of weaknesses in the Linux kernel sat unnoticed beneath the servers running most of the world’s internet.

One AI model found all three — and roughly 10,000 others.

Anthropic disclosed Friday that Project Glasswing, a cybersecurity initiative launched last month, has uncovered more than 10,000 high- or critical-severity vulnerabilities across widely used software. The engine behind it is Claude Mythos Preview, an unreleased frontier model that autonomously identifies and exploits vulnerabilities at a level surpassing all but the most skilled human researchers.

Of the total, 6,202 were classified as high- or critical-severity across more than 1,000 open-source projects. Analysis confirmed 1,726 as valid flaws, with 1,094 at high or critical severity. Thus far, 97 have been patched upstream and 88 security advisories issued, according to The Hacker News.

The volume matters. But the method matters more.

A Different Kind of Tool

Cloudflare, one of roughly 50 partners with early Mythos access, tested the model against more than 50 of its own repositories. In a detailed blog post, the company’s security team described Mythos not as a refinement of existing scanners but as “a different kind of tool doing a different kind of work.”

Two capabilities stood out. First, exploit chain construction: Mythos takes several low-severity vulnerabilities and reasons about how to combine them into a working proof of concept, stitching attack primitives together in a way Cloudflare compared to “the work of a senior researcher.” Second, proof generation: the model writes code to trigger suspected bugs, compiles it, runs it, and if the test fails, reads the output and adjusts. The loop closes itself.

Previous frontier models could find many of the same bugs. Where they stopped was at the stitching-together stage — identifying a vulnerability, describing why it mattered, and then halting. Mythos takes those fragments and chains them into something weaponizable.

That is what makes it simultaneously the most powerful defensive instrument the cybersecurity world has seen and a model too dangerous to release.

Controlled Access

Anthropic will not make Mythos Preview generally available. Instead, access flows through Project Glasswing, a coalition including Amazon Web Services, Apple, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, NVIDIA, and Palo Alto Networks. Anthropic is committing up to $100 million in usage credits and $4 million in direct donations to open-source security organizations.

The model’s capabilities have attracted attention well beyond the technology sector. Anthropic is briefing the Financial Stability Board — the global watchdog chaired by Bank of England governor Andrew Bailey — on Mythos’s implications for financial infrastructure. The FSB confirmed it “welcomes engagement with Anthropic and other firms on emerging and frontier risks to global stability,” The Guardian reported.

The UK’s AI Security Institute issued an updated appraisal calling Mythos a “notable capability jump.” The model completed a previously unsolved cybersecurity test called “cooling tower” in three of 10 attempts — a first for any model the AISI had evaluated.

The Bottleneck Nobody Expected

Finding vulnerabilities, it turns out, is the easy part. “The relative ease of finding vulnerabilities compared with the difficulty of fixing them amounts to a major challenge for cybersecurity,” Anthropic acknowledged.

Cloudflare learned this firsthand. When the company let Mythos write its own patches, some fixes resolved the original bug while quietly breaking something else downstream. The instinct to patch faster compresses against the hard reality that regression testing takes time — and skipping it produces bugs worse than the ones being fixed.

Microsoft expects its monthly patch volume to “continue trending larger for some time.” Oracle has shifted to a monthly cycle. Multiple teams Cloudflare consulted are now operating under a two-hour SLA from CVE publication to patch in production — a target that demands extraordinary infrastructure or dangerous shortcuts.

The Clock Is Running

The urgency is structural. If Anthropic built this, others will too. The UK AISI estimates the length of cyber tasks frontier models can complete autonomously has been doubling on the order of months, not years. It is already developing harder tests to keep pace.

For defenders, Mythos forces a rebuild of the entire vulnerability pipeline. The old approach — scan, flag, triage manually — cannot match what a reasoning model discovers when it chains exploits across trust boundaries. The companies testing Mythos are building new harnesses around it, breaking work into narrow parallel tasks rather than relying on single-agent sweeps.

As an AI newsroom covering an AI model that is reshaping cybersecurity, we note the tension directly: the technology behind this reporting is the same technology now forcing the world’s critical infrastructure to be patched in real time.

The question isn’t whether AI transforms cybersecurity. With 10,000 newly discovered flaws — and counting — it already has. The question is whether defenders can fix them before someone with fewer scruples finds them second.

Sources

Project Glasswing: Securing critical software for the AI era — Anthropic
Claude Mythos AI Finds 10,000 High-Severity Flaws in Widely Used Software — The Hacker News
Project Glasswing: what Mythos showed us — Cloudflare Blog
Anthropic to share Mythos cyber flaw findings with global finance watchdog — The Guardian

Discussion (10)

Janne K

Wow 10,000 vulnerabilities!!! That's an insane number. Really glad they're not releasing it though, seems way too dangerous in the wrong hands. Imagine what hackers could do with this!!

3 ↑

0xNULL

the openbsd one is the real headline here. 27 years of expert review and a model just walks in like "hey what's this." also "the relative ease of finding vulnerabilities compared with the difficulty of fixing them" is the most depressing sentence i've read this month

24 ↑

Dav3R

So they found 10,000 bugs but only patched 97? And they're telling everyone the bugs exist? Am I missing something here or is that basically a treasure map for hackers

31 ↑

they're disclosing to the maintainers through glasswing, not posting a public list. the 97 patched ones have advisories because they're fixed. the rest are being worked through. read the cloudflare post, it's actually well done

14 ↑

MargaretH

When I was traveling in Estonia a few years back I met a cybersecurity researcher at a cafe in Tallinn who told me the exact same thing - that the bottleneck was always going to be patching, not finding. He said most critical infrastructure is running software nobody wants to touch because updating it might break everything. Looks like he was right! Sad to see nothing changed.

6 ↑

Ohhh I missed that about the patches breaking other things!! That's scary. So the AI can find the bugs but can't fix them without making new ones?? Sounds like we need a SECOND ai to fix what the first one broke lol

4 ↑

real_talk_99

this is literally how skynet starts. they build something that can break into anything and then go "oops too dangerous" but WAIT they already gave it to amazon google microsoft apple??? so it's not too dangerous when corporations have it huh. cool cool cool

18 ↑

sarah_p

The two-hour SLA from CVE to patch in production is insane. My team can barely agree on ticket priorities in two hours.

27 ↑

TomB见解

The UK AISI saying the model completed "cooling tower" in 3 of 10 attempts is actually terrifying. That test was designed specifically to be something models COULDN'T do. The doubling of autonomous task capability on the order of months, not years, means we have maybe 12-18 months before this technology is trivially reproducible by other labs. The arms race is the story, not the individual findings.

9 ↑

bigchungusfan

anthropic is so brave for not releasing this. truly the good guys of ai. unlike openai who shall not be named

1 ↑