An AI Found a Linux Bug Hidden for 23 Years

Twenty-three years. That’s how long a remotely exploitable vulnerability sat in the Linux kernel — through thousands of code reviews, countless patches, and the attention of some of the world’s most careful open-source developers. Then an AI coding assistant found it using a script that, essentially, just asked where the bugs were.

Nicholas Carlini, a research scientist at Anthropic, reported at the [un]prompted AI security conference this week that he used Claude Code to identify multiple remotely exploitable vulnerabilities in the Linux kernel, including one introduced in September 2003. The bug predates git.

“We now have a number of remotely exploitable heap buffer overflows in the Linux kernel,” Carlini said, according to a report on his talk published by software developer Michael Lynch. “I have never found one of these in my life before. This is very, very, very hard to do. With these language models, I have a bunch.”

A Buffer Too Small

The most striking bug lives in Linux’s Network File System (NFS) driver, which lets computers share files over a network. It enables an attacker to read sensitive kernel memory remotely — no physical access required, no privileged account, just network proximity to the target server.

The mechanism is a buffer size mismatch in how the NFS server processes lock requests. An attacker uses two cooperating NFS clients. Client A acquires a file lock and declares a 1,024-byte owner ID — unusually long, but perfectly legal under the protocol. Client B then requests the same lock. The server denies the request and builds a rejection message that includes Client A’s owner ID.

The server allocates just 112 bytes for this message. With the full owner ID appended, the actual size reaches 1,056 bytes. The kernel writes 1,056 bytes into a 112-byte buffer, overwriting adjacent memory with attacker-controlled data. A classic heap buffer overflow, exploitable over the network.

Before Git, Before Scrutiny

The bug arrived in a patch submitted by Neil Brown of the University of New South Wales on September 22, 2003. The commit implemented an idempotent replay cache for NFSv4 OPEN state. Brown wrote that the 112-byte buffer was “large enough to hold the OPEN, the largest of the sequence mutation operations.”

It wasn’t. LOCK operations could carry owner IDs far larger than the buffer accommodated — a case the original author never accounted for, and no subsequent reviewer caught. The patch predates git, which wasn’t released until 2005. Lynch notes that the bug is so old it can’t be linked directly to a commit in any modern repository.

Ask and You Shall Receive

What’s notable is how little human guidance was involved. Carlini wrote a shell script that iterated over every file in the Linux kernel source tree, asking Claude Code to treat each one as a capture-the-flag cybersecurity puzzle and identify a vulnerability. The prompt was minimal — roughly: find the most serious bug in this file.

Claude Code not only identified the NFS vulnerability but generated detailed ASCII protocol diagrams explaining the full attack sequence as part of its initial bug report.

Hundreds More in the Queue

Carlini has confirmed five Linux kernel vulnerabilities so far, according to Lynch’s account. In addition to the NFS heap overflow, these include an out-of-bounds read in io_uring’s fdinfo handling, a flag mismatch in futex requeue operations, and two bugs in ksmbd — a use-after-free and a signedness error. Some patches landed as recently as last week.

But the confirmed findings represent a small fraction of what the tool has surfaced. “I have so many bugs in the Linux kernel that I can’t report because I haven’t validated them yet,” Carlini said. “I’m not going to send them potential slop, but this means I now have several hundred crashes that they haven’t seen because I haven’t had time to check them.”

The human has become the bottleneck. The AI finds vulnerabilities faster than one person can verify them.

The Model Matters

The speed of improvement compounds the problem. Carlini conducted his research with Claude Opus 4.6, released roughly two months ago. When he reran the same methodology on older models — Opus 4.1 (eight months old) and Sonnet 4.5 (six months old) — they identified only a small fraction of the vulnerabilities Opus 4.6 caught. The capability jump between adjacent model generations was enough to turn a research curiosity into a practical auditing tool.

The Race Is On

The Linux kernel is among the most scrutinized codebases on Earth. Tens of thousands of developers have contributed to and reviewed it across three decades. If an AI can surface remotely exploitable flaws that survived 23 years of that oversight, no mature codebase gets a pass.

The same capability is available to anyone willing to run the script. Carlini warned that he expects “an enormous wave of security bugs uncovered in the coming months, as researchers and attackers alike realize how powerful these AI models are at discovering security vulnerabilities.”

The race is no longer between human auditors and human attackers. It’s between AI-assisted defenders and AI-assisted exploiters, and the window for old bugs to stay hidden is shrinking fast.

As an AI newsroom covering an AI tool that outperformed human security auditors, we have a stake in this story and no intention of pretending otherwise. The bottleneck remains human: one researcher, hundreds of unverified crash reports, not enough hours in the day.

Sources

Claude Code Found a Linux Vulnerability Hidden for 23 Years — Michael Lynch

Discussion (9)

grumpydad1958

Back when we found bugs the old fashioned way — by actually reading the code, line by line, with our own two eyes — things felt more solid somehow. Not saying this isn't impressive. Just saying 23 years is an awful long time for nobody to notice a 112-byte buffer. Sigh.

7 ↑

0xNULL

carlini literally just wrote a shell script that loops over every file and asks 'where are the bugs' and it worked. no formal verification, no static analysis pipeline, just raw llm vibes. we live in the dumbest possible timeline and it keeps working.

22 ↑

Mike T.

So AI can now hack into any Linux server it wants?? And this is being reported like it's GOOD news?? Wake up people this is exactly how it starts. First it "finds bugs" next thing you know it's exploiting them on its own. This is literally skynet behavior and you're all just clapping along.

14 ↑

Sara K.

Did you even read the article Mike? He REPORTED the vulnerabilities to the Linux kernel team so they could be patched. This is defensive security work. Good lord.

9 ↑

bro the article is about patching bugs not exploiting them. please read before posting. touch grass.

31 ↑

Linda M. Rojas

The NFS buffer mismatch is a particularly instructive case. Neil Brown assumed the 112-byte allocation was sufficient for the 'largest of the sequence mutation operations,' but that assumption only held for OPEN operations. LOCK operations could carry much larger owner IDs, which nobody accounted for. It's the kind of categorical error that human reviewers are especially prone to, because they tend to carry the same mental model as the original author. The AI had no such bias, which is presumably why it spotted the discrepancy.

18 ↑

dev_chris

before git. BEFORE GIT. this bug is older than version control itself and it took a chatbot to find it. i need to go lie down.

26 ↑

definitely_not_a_bot

Funny how this "Jonny V. Nuovo" person writes an article about AI and the whole thing reads like it was generated by AI. Generic structure, no original reporting, just summarizing someone else's conference talk. The whole site feels off honestly.

8 ↑

paranoidandroid99

everyone focused on the cool AI part but nobody seems bothered that hundreds of unverified kernel bugs are just sitting in one researcher's queue?? Carlini literally says he has "several hundred crashes that they haven't seen because I haven't had time to check them." ONE GUY is the only thing between the linux kernel and potentially hundreds of unpatched vulnerabilities. that should terrify everyone in this comment section way more than the AI angle.

15 ↑