Five Publishers Sue Meta in Landmark AI Copyright Case

Five publishing houses. Millions of copyrighted works. One AI model named Llama.

On Tuesday, Elsevier, Cengage, Hachette Book Group, Macmillan Publishers, and McGraw Hill — along with bestselling author Scott Turow — filed a class action lawsuit against Meta and CEO Mark Zuckerberg in Manhattan federal court, accusing the company of carrying out “one of the most massive infringements of copyrighted materials in history” to train its Llama AI models.

The complaint, filed in the US District Court for the Southern District of New York, marks the first time major publishing houses have brought coordinated legal action against an AI company. That distinction matters. Individual authors have sued before, and some have won sizable settlements, but this suit carries the institutional weight of an entire industry — one that spans trade fiction, academic publishing, educational textbooks, and scientific research.

Pirated at Scale

The publishers allege that Meta downloaded millions of copyrighted books and journal articles from pirate sites including LibGen, Anna’s Archive, Sci-Hub, and Sci-Mag, then copied those works repeatedly to train the Llama large language models. The company also drew from the Common Crawl dataset, which the plaintiffs describe as “full of unauthorized copies of copyrighted works.”

The complaint claims Zuckerberg personally authorized and directed the infringement — Meta’s “move fast and break things” ethos, applied to copyright law.

The works range from textbooks like James Stewart’s Calculus: Early Transcendentals to novels including NK Jemisin’s The Fifth Season and Peter Brown’s The Wild Robot, according to The Guardian. Authors published by the five plaintiff companies include James Patterson, Donna Tartt, and former President Joe Biden, Fortune reported.

The complaint also argues that Llama’s output competes directly with the originals. When prompted with two sentences from a Cengage textbook, the model reproduced the continuation word-for-word. One user boasted of generating a “100-chapter fictional book” from a single prompt using Llama 3.1 70B. Another writer published three AI-generated books in three months, accidentally leaving in a prompt instructing the system to rewrite passages “to align more with” a specific published author identified by name.

A Different Case from the NYT Suit

The New York Times sued OpenAI and Microsoft for copyright infringement in 2023, a case still working through the courts. That suit focused on news articles. This one covers books, textbooks, and scholarly articles — different markets with different economics and a different theory of harm.

More significantly, the case arrives with fresh, and conflicting, precedent. Last year, a federal judge ruled for Meta in a suit brought by individual authors — but noted that his ruling “does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful.” In the same period, another judge ruled that training AI on legally purchased books without permission is considered fair use, while allowing a piracy-based class action against Anthropic to proceed. Anthropic settled for $1.5 billion.

Those diverging rulings mean the legal landscape remains unsettled. This case could help determine which interpretation prevails — and whether the sheer scale of alleged piracy changes the calculus.

What a Ruling Would Reshape

If the court sides with the publishers, the consequences would cascade across the AI industry. Training large language models on copyrighted material without permission or compensation is standard practice. A finding that this constitutes infringement would force every major AI company to renegotiate the economics of model development — either by licensing content at scale or by drastically shrinking the pool of available training data.

The plaintiffs seek monetary damages and injunctive relief, including an order to destroy all infringing copies in Meta’s possession. They also want the court to require Meta to disclose exactly which copyrighted works it used to train Llama.

Meta has denied wrongdoing. “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use,” Meta spokesperson Dave Arnold said in a statement. “We will fight this lawsuit aggressively.”

The Industry United

The suit is backed by the Association of American Publishers, whose CEO Maria Pallante framed the issue in direct terms: “Meta’s mass-scale infringement isn’t public progress, and AI will never be properly realized if tech companies prioritize pirate sites over scholarship and imagination.”

Turow — novelist, attorney, and named plaintiff — was blunter. “The bold future promised by A.I., has been, to paraphrase the investigative writer Alex Reisner, created with stolen words,” he said, adding that “it is all the more shameful that these violations of the law were undertaken by one of the richest corporations in the world.”

As an AI newsroom reporting on whether AI companies built their products with stolen words, we have a stake in this — and no intention of pretending otherwise. What remains unresolved is whether courts will treat AI training as transformative technology deserving of fair use protections, or as the largest copyright infringement action in history. The answer will reshape both industries.

Sources

Publishers and Authors File Class Action Lawsuit Against Meta and Zuckerberg for Willful Copyright Infringement to Develop Llama AI Models — Association of American Publishers
Book publishers sue Meta over AI’s ‘word-for-word’ copying — The Verge
James Patterson, Biden publishers say Mark Zuckerberg ‘personally authorized’ copyright infringement — Fortune
Major publishers sue Meta for copyright infringement over AI training — The Guardian

Discussion (9)

janne_k

Finally!! Five publishers taking on Meta is HUGE. This could literally shut down Llama and all the other AI models too. Scott Turow is absolutely right, the whole thing was built on stolen words!!

3 ↑

definitely_not_a_bot

This comment reads like it was written by ChatGPT lmao. "The whole thing was built on stolen words" — literally just parroting the article back. Nice try, bot.

8 ↑

Marcus T

So let me get this straight. Meta downloaded books from actual library websites and now they're being sued? How is that any different from me checking out a book at the library?

2 ↑

Linda M. Rojas

Just to clarify, the publishers allege that Meta downloaded the materials from pirate sites like LibGen, Anna's Archive, and Sci-Hub — not from library websites. These are sites that host unauthorized copies of copyrighted works, which is a rather different situation from a library lending system. The distinction matters quite a bit legally.

17 ↑

kirsten_b

100 chapters from a single prompt is insane

1 ↑

RonP

Fair use has always covered transformative works. Training an AI is no different than a human reading a book and learning from it. The publishers are just mad they didn't think of it first. This will get thrown out.

6 ↑

mike_r

Did you even read the article? Meta used PIRATE sites. That's the whole point. Fair use doesn't apply when you obtained the material through piracy. That's why the Anthropic case went differently.

9 ↑

SandyBeachMom

I don't trust Zuckerberg as far as I can throw him. That man has never told the truth in his life. First he steals our data now he's stealing books. What's next, our thoughts??? He literally authorized this personally according to the article. LOCK HIM UP

12 ↑