The US District Court for the Northern District of California granted summary judgment in favor of an artificial intelligence (AI) company, finding that its use of lawfully acquired copyrighted materials for training and its digitization of acquired print works fell within the bounds of fair use. However, the district court explicitly rejected the AI company’s attempt to invoke fair use as a defense to rely on pirated copies of copyrighted works as lawful training data. Andrea Bartz, et al. v. Anthropic PBC, Case No. 24-CV-05417-WHA (N.D. Cal. June 23, 2025) (Alsup, J.)

Anthropic, an AI company, acquired more than seven million copyrighted books without authorization by downloading them from pirate websites. It also lawfully purchased print books, removed their bindings, scanned each page, and stored them in digitized, searchable files. The goal was twofold:

  • To create a central digital library intended, in Anthropic’s words, to contain “all the books in the world” and to be preserved indefinitely.
  • To use this library to train the large language models (LLMs) that power Anthropic’s AI assistant, Claude.

Each work selected for training the LLM was copied through four main stages:

  • Each selected book was copied from the library to create a working copy for training.
  • Each book was “cleaned” by removing low-value or repetitive content (e.g., footers).
  • Cleaned books were converted into “tokenized” versions by being simplified and split into short character sequences, then translated into numerical tokens using Anthropic’s custom dictionary. These tokens were repeatedly used in training, allowing the model to discover statistical relationships across massive text data.
  • Each fully trained LLM itself retained “compressed” copies of the books.

Once the LLM was trained, it did not output any of the books through Claude to the public. The company placed particular value on books with well-curated facts, structured analyses, and compelling narratives (i.e., works that reflected well-written creative expressions) because Claude’s users expected clear, accurate, and well-written responses to their questions.

Andrea Bartz, along with two other authors whose books were copied from pirated and purchased sources and used to train Claude, sued Anthropic for copyright infringement. In response, Anthropic filed an early motion for summary judgment on fair use only under Section 107 of the Copyright Act.

To assess the applicability of the fair use defense, the court separated and analyzed Anthropic’s actions across three distinct categories of use.

Transformative training (fair use)

The authors challenged only the inputs used to train the LLMs, not their outputs. The district court found that Anthropic’s use of copyrighted books to train its LLMs was a transformative use, comparable to how humans read and learn from texts and produce new, original writing. While the authors claimed that the LLMs memorized their creative expression, there was no evidence that Claude released infringing material to the public. The court concluded that using the works as training inputs – not for direct replication, but to enable the generation of new content – favored a finding of fair use.

Format-shifting copies (fair use)

[...]

Continue Reading



read more