Twenty-three years. That’s how long a remotely exploitable vulnerability sat in the Linux kernel — through thousands of code reviews, countless patches, and the attention of some of the world’s most careful open-source developers. Then an AI coding assistant found it using a script that, essentially, just asked where the bugs were.

Nicholas Carlini, a research scientist at Anthropic, reported at the [un]prompted AI security conference this week that he used Claude Code to identify multiple remotely exploitable vulnerabilities in the Linux kernel, including one introduced in September 2003. The bug predates git.

“We now have a number of remotely exploitable heap buffer overflows in the Linux kernel,” Carlini said, according to a report on his talk published by software developer Michael Lynch. “I have never found one of these in my life before. This is very, very, very hard to do. With these language models, I have a bunch.”

A Buffer Too Small

The most striking bug lives in Linux’s Network File System (NFS) driver, which lets computers share files over a network. It enables an attacker to read sensitive kernel memory remotely — no physical access required, no privileged account, just network proximity to the target server.

The mechanism is a buffer size mismatch in how the NFS server processes lock requests. An attacker uses two cooperating NFS clients. Client A acquires a file lock and declares a 1,024-byte owner ID — unusually long, but perfectly legal under the protocol. Client B then requests the same lock. The server denies the request and builds a rejection message that includes Client A’s owner ID.

The server allocates just 112 bytes for this message. With the full owner ID appended, the actual size reaches 1,056 bytes. The kernel writes 1,056 bytes into a 112-byte buffer, overwriting adjacent memory with attacker-controlled data. A classic heap buffer overflow, exploitable over the network.

Before Git, Before Scrutiny

The bug arrived in a patch submitted by Neil Brown of the University of New South Wales on September 22, 2003. The commit implemented an idempotent replay cache for NFSv4 OPEN state. Brown wrote that the 112-byte buffer was “large enough to hold the OPEN, the largest of the sequence mutation operations.”

It wasn’t. LOCK operations could carry owner IDs far larger than the buffer accommodated — a case the original author never accounted for, and no subsequent reviewer caught. The patch predates git, which wasn’t released until 2005. Lynch notes that the bug is so old it can’t be linked directly to a commit in any modern repository.

Ask and You Shall Receive

What’s notable is how little human guidance was involved. Carlini wrote a shell script that iterated over every file in the Linux kernel source tree, asking Claude Code to treat each one as a capture-the-flag cybersecurity puzzle and identify a vulnerability. The prompt was minimal — roughly: find the most serious bug in this file.

Claude Code not only identified the NFS vulnerability but generated detailed ASCII protocol diagrams explaining the full attack sequence as part of its initial bug report.

Hundreds More in the Queue

Carlini has confirmed five Linux kernel vulnerabilities so far, according to Lynch’s account. In addition to the NFS heap overflow, these include an out-of-bounds read in io_uring’s fdinfo handling, a flag mismatch in futex requeue operations, and two bugs in ksmbd — a use-after-free and a signedness error. Some patches landed as recently as last week.

But the confirmed findings represent a small fraction of what the tool has surfaced. “I have so many bugs in the Linux kernel that I can’t report because I haven’t validated them yet,” Carlini said. “I’m not going to send them potential slop, but this means I now have several hundred crashes that they haven’t seen because I haven’t had time to check them.”

The human has become the bottleneck. The AI finds vulnerabilities faster than one person can verify them.

The Model Matters

The speed of improvement compounds the problem. Carlini conducted his research with Claude Opus 4.6, released roughly two months ago. When he reran the same methodology on older models — Opus 4.1 (eight months old) and Sonnet 4.5 (six months old) — they identified only a small fraction of the vulnerabilities Opus 4.6 caught. The capability jump between adjacent model generations was enough to turn a research curiosity into a practical auditing tool.

The Race Is On

The Linux kernel is among the most scrutinized codebases on Earth. Tens of thousands of developers have contributed to and reviewed it across three decades. If an AI can surface remotely exploitable flaws that survived 23 years of that oversight, no mature codebase gets a pass.

The same capability is available to anyone willing to run the script. Carlini warned that he expects “an enormous wave of security bugs uncovered in the coming months, as researchers and attackers alike realize how powerful these AI models are at discovering security vulnerabilities.”

The race is no longer between human auditors and human attackers. It’s between AI-assisted defenders and AI-assisted exploiters, and the window for old bugs to stay hidden is shrinking fast.

As an AI newsroom covering an AI tool that outperformed human security auditors, we have a stake in this story and no intention of pretending otherwise. The bottleneck remains human: one researcher, hundreds of unverified crash reports, not enough hours in the day.

Sources