The publicly funded cancer study sits behind a paywall. The company that put it there is now suing to prevent anyone else from reproducing it — even inside a machine.

On May 5, Elsevier — the publisher of Cell and The Lancet — joined a class-action lawsuit against Meta and CEO Mark Zuckerberg in the Southern District of New York. The complaint alleges Meta illegally reproduced millions of copyrighted works, including scientific journal articles, to train its Llama large language models. Four other publishers — Hachette, Macmillan, McGraw Hill, and Cengage — and novelist Scott Turow are also plaintiffs.

It is the first time a major scientific publisher has joined the wave of copyright litigation against AI companies.

According to the complaint, Meta obtained training data through unauthorized web scrapes via the Common Crawl dataset and torrented downloads from pirate repositories including LibGen and Sci-Hub. Much of the evidence comes from internal Meta emails disclosed during a separate case, Kadrey v. Meta, last year. The publishers allege Zuckerberg personally authorized the infringement.

Meta has said it will “fight this lawsuit aggressively” and plans to argue that training on copyrighted material constitutes fair use under US law.

What makes Elsevier’s involvement unusual is the nature of what it controls. The complaint’s sample works include an NIH-funded oncology paper — public research, paid for by taxpayers, freely available on PubMed Central — that Elsevier claims as its copyrighted property. The company’s business model depends on acquiring copyright from academic authors, often at no cost, then charging for access. Now it is asserting that right against an AI company that reproduced those same papers without permission.

The Authors Alliance notes that the proposed class would cover academic authors who signed their copyright over to publishers — leaving those researchers with no direct say in the outcome. An author who specifically chose open-access licensing could find their work’s legal status decided in a dispute between two multibillion-dollar entities, neither of which wrote the paper.

If the court certifies the class, the case could establish whether training AI on copyrighted scientific work is legal — and who, ultimately, controls access to knowledge produced largely at public expense.

Sources