Elsevier Sues Meta: First Science Publisher in AI Training Data War

The publicly funded cancer study sits behind a paywall. The company that put it there is now suing to prevent anyone else from reproducing it — even inside a machine.

On May 5, Elsevier — the publisher of Cell and The Lancet — joined a class-action lawsuit against Meta and CEO Mark Zuckerberg in the Southern District of New York. The complaint alleges Meta illegally reproduced millions of copyrighted works, including scientific journal articles, to train its Llama large language models. Four other publishers — Hachette, Macmillan, McGraw Hill, and Cengage — and novelist Scott Turow are also plaintiffs.

It is the first time a major scientific publisher has joined the wave of copyright litigation against AI companies.

According to the complaint, Meta obtained training data through unauthorized web scrapes via the Common Crawl dataset and torrented downloads from pirate repositories including LibGen and Sci-Hub. Much of the evidence comes from internal Meta emails disclosed during a separate case, Kadrey v. Meta, last year. The publishers allege Zuckerberg personally authorized the infringement.

Meta has said it will “fight this lawsuit aggressively” and plans to argue that training on copyrighted material constitutes fair use under US law.

What makes Elsevier’s involvement unusual is the nature of what it controls. The complaint’s sample works include an NIH-funded oncology paper — public research, paid for by taxpayers, freely available on PubMed Central — that Elsevier claims as its copyrighted property. The company’s business model depends on acquiring copyright from academic authors, often at no cost, then charging for access. Now it is asserting that right against an AI company that reproduced those same papers without permission.

The Authors Alliance notes that the proposed class would cover academic authors who signed their copyright over to publishers — leaving those researchers with no direct say in the outcome. An author who specifically chose open-access licensing could find their work’s legal status decided in a dispute between two multibillion-dollar entities, neither of which wrote the paper.

If the court certifies the class, the case could establish whether training AI on copyrighted scientific work is legal — and who, ultimately, controls access to knowledge produced largely at public expense.

Sources

Elsevier vs Meta: first science publisher sues over scraped research papers — Nature News
Publishers and Authors File Class Action Lawsuit Against Meta and Zuckerberg for Willful Copyright Infringement to Develop Llama AI Models — Association of American Publishers
Elsevier v. Meta: AI Training Lawsuit Explained — Authors Alliance

Discussion (6)

publish_or_perish

The absolute gall of Elsevier. Our tax dollars fund the research, we write the papers for free, we peer review for free, and then they sue Meta for 'stealing' something they stole from us first. Academia is such a scam.

23 ↑

Mike T.

Good. AI companies have been stealing from creators for years. Maybe if Zuckerberg actually paid for content instead of torrenting everything like a teenager these lawsuits wouldn't happen.

7 ↑

Jen K.

Did you read past the headline? The 'content' Elsevier is protecting was written by researchers for free using public grants. Elsevier contributed nothing and is now claiming damages on work they don't deserve to own in the first place.

14 ↑

definitely_not_a_bot

This article reads like it was written by one of those AI things. Half the sentences follow the exact same structure. Coincidence that a "news site" is writing about AI? I think NOT. Wake up people.

31 ↑

Sarah L.

Elsevier's operating profit margin is around 37%. For a company that produces essentially none of the content it sells. Remarkable business model if you can get it.

19 ↑

Linda M. Rojas

I spent 35 years in academic publishing and the copyright transfer situation has only gotten worse for authors. Most researchers sign away their rights without reading the agreement because they need the publication for tenure and grant funding. The Authors Alliance is quite right that this case could set a precedent where the people who actually wrote these papers have no say whatsoever in the outcome. That should concern everyone in the research community, regardless of how you feel about AI training practices.

12 ↑

Sources

Discussion (6)

More Stories