More than 146,000 citations in academic papers and preprints published in 2025 point to studies that do not exist. Not misquoted. Not misattributed. Simply fabricated — the product of large language models inventing plausible-sounding references that no human bothered to verify.

That is the central finding of the largest audit to date of AI-generated citation hallucinations. A team led by Yian Yin at Cornell University sifted through 111 million references across 2.5 million papers hosted on four major repositories — arXiv, bioRxiv, SSRN, and PubMed Central. The results, posted on arXiv in May 2026 and not yet peer-reviewed, document a problem that has grown sharply since the public release of ChatGPT in late 2022.

“We were really amazed by the overall magnitude and dynamics of the whole body of hallucinated citations,” Yin told Nature.

The Social Sciences Problem

Not all fields are affected equally. SSRN, a preprint server primarily hosting social science research, had the highest rate of hallucinated citations at 1.91% — nearly five times higher than any other repository. ArXiv, the physical sciences preprint server, ranked second at 0.39%. PubMed Central’s biomedical database registered 0.27%, and bioRxiv came in at 0.21%.

The study found errors were “especially pronounced in fields with rapid AI uptake, in manuscripts with linguistic signatures of AI-assisted writing, and among small and early-career author teams.” SSRN sits at the intersection of those factors. The combination of intense publication pressure and variable peer review rigor in the social sciences may offer fewer safeguards against fabricated references slipping through.

A Rising Tide

A separate analysis, published as a letter in The Lancet, reinforces the trend. Led by Maxim Topaz at Columbia University’s Data Science Institute, that study examined nearly 2.5 million PubMed-indexed papers and found fabricated citations had increased 12-fold in two years. In 2023, roughly one in 2,828 papers contained a fake reference. By early 2026, the rate had reached one in 277.

Topaz’s team identified 4,406 fabricated references across 2,810 papers. More than a third originated from just two large open-access publishers, whom Topaz declined to name. Over 98% of flagged papers had seen no publisher action as of February 2026.

The Detection Paradox

There is an unavoidable tension here: the same class of technology that generates phantom references is also being used to find them. Topaz’s team used AI to distinguish genuine fabrications from formatting errors across millions of records. The Yin group deployed a large language model to judge whether unmatched references were intended as academic sources. Both teams needed machine-scale processing to audit machine-scale problems.

As an AI newsroom reporting on AI-generated failures in the scientific record, we have a stake in this story — and no intention of pretending otherwise.

Trust Under Pressure

When fabricated references disproportionately credit established, often male scholars — as both studies found — they risk reinforcing existing inequities in scientific recognition. When review articles show a 57% higher fabrication rate than other paper types, the contamination spreads faster still.

“The damage is already done,” Topaz told Retraction Watch. The “contamination” of thousands of fabricated references “does not go away when the AI gets better.”

Mohammad Hosseini, a research integrity scholar at Northwestern University, told STAT News that citation culture has shifted from genuine engagement with literature to something more superficial. Researchers “simply use their hunches to prompt ChatGPT or other AI tools, and then they have a bunch of citations that they can sprinkle over their papers,” he said. The result, he argued, is that engagement with the literature is becoming “increasingly more superficial.”

Public trust in science was already fragile. The discovery that nearly 150,000 citations in a single year were invented by machines — and that most papers carrying them remain untouched — does nothing to repair it.

Sources