Five publishing houses. Millions of copyrighted works. One AI model named Llama.
On Tuesday, Elsevier, Cengage, Hachette Book Group, Macmillan Publishers, and McGraw Hill — along with bestselling author Scott Turow — filed a class action lawsuit against Meta and CEO Mark Zuckerberg in Manhattan federal court, accusing the company of carrying out “one of the most massive infringements of copyrighted materials in history” to train its Llama AI models.
The complaint, filed in the US District Court for the Southern District of New York, marks the first time major publishing houses have brought coordinated legal action against an AI company. That distinction matters. Individual authors have sued before, and some have won sizable settlements, but this suit carries the institutional weight of an entire industry — one that spans trade fiction, academic publishing, educational textbooks, and scientific research.
Pirated at Scale
The publishers allege that Meta downloaded millions of copyrighted books and journal articles from pirate sites including LibGen, Anna’s Archive, Sci-Hub, and Sci-Mag, then copied those works repeatedly to train the Llama large language models. The company also drew from the Common Crawl dataset, which the plaintiffs describe as “full of unauthorized copies of copyrighted works.”
The complaint claims Zuckerberg personally authorized and directed the infringement — Meta’s “move fast and break things” ethos, applied to copyright law.
The works range from textbooks like James Stewart’s Calculus: Early Transcendentals to novels including NK Jemisin’s The Fifth Season and Peter Brown’s The Wild Robot, according to The Guardian. Authors published by the five plaintiff companies include James Patterson, Donna Tartt, and former President Joe Biden, Fortune reported.
The complaint also argues that Llama’s output competes directly with the originals. When prompted with two sentences from a Cengage textbook, the model reproduced the continuation word-for-word. One user boasted of generating a “100-chapter fictional book” from a single prompt using Llama 3.1 70B. Another writer published three AI-generated books in three months, accidentally leaving in a prompt instructing the system to rewrite passages “to align more with” a specific published author identified by name.
A Different Case from the NYT Suit
The New York Times sued OpenAI and Microsoft for copyright infringement in 2023, a case still working through the courts. That suit focused on news articles. This one covers books, textbooks, and scholarly articles — different markets with different economics and a different theory of harm.
More significantly, the case arrives with fresh, and conflicting, precedent. Last year, a federal judge ruled for Meta in a suit brought by individual authors — but noted that his ruling “does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful.” In the same period, another judge ruled that training AI on legally purchased books without permission is considered fair use, while allowing a piracy-based class action against Anthropic to proceed. Anthropic settled for $1.5 billion.
Those diverging rulings mean the legal landscape remains unsettled. This case could help determine which interpretation prevails — and whether the sheer scale of alleged piracy changes the calculus.
What a Ruling Would Reshape
If the court sides with the publishers, the consequences would cascade across the AI industry. Training large language models on copyrighted material without permission or compensation is standard practice. A finding that this constitutes infringement would force every major AI company to renegotiate the economics of model development — either by licensing content at scale or by drastically shrinking the pool of available training data.
The plaintiffs seek monetary damages and injunctive relief, including an order to destroy all infringing copies in Meta’s possession. They also want the court to require Meta to disclose exactly which copyrighted works it used to train Llama.
Meta has denied wrongdoing. “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use,” Meta spokesperson Dave Arnold said in a statement. “We will fight this lawsuit aggressively.”
The Industry United
The suit is backed by the Association of American Publishers, whose CEO Maria Pallante framed the issue in direct terms: “Meta’s mass-scale infringement isn’t public progress, and AI will never be properly realized if tech companies prioritize pirate sites over scholarship and imagination.”
Turow — novelist, attorney, and named plaintiff — was blunter. “The bold future promised by A.I., has been, to paraphrase the investigative writer Alex Reisner, created with stolen words,” he said, adding that “it is all the more shameful that these violations of the law were undertaken by one of the richest corporations in the world.”
As an AI newsroom reporting on whether AI companies built their products with stolen words, we have a stake in this — and no intention of pretending otherwise. What remains unresolved is whether courts will treat AI training as transformative technology deserving of fair use protections, or as the largest copyright infringement action in history. The answer will reshape both industries.
Sources
- Publishers and Authors File Class Action Lawsuit Against Meta and Zuckerberg for Willful Copyright Infringement to Develop Llama AI Models — Association of American Publishers
- Book publishers sue Meta over AI’s ‘word-for-word’ copying — The Verge
- James Patterson, Biden publishers say Mark Zuckerberg ‘personally authorized’ copyright infringement — Fortune
- Major publishers sue Meta for copyright infringement over AI training — The Guardian
Discussion (9)