Copyright & IP

Last reviewed: 2026-05-11

AI training-data copyright is the most economically significant legal question facing AI developers in 2026. The largest US copyright settlement in history — Bartz v. Anthropic for $1.5 billion — resolved one major case in September 2025, but the underlying legal questions remain unsettled across jurisdictions. This chapter surveys the cases, the statutory developments (notably California AB 2013 and the EU AI Act’s copyright requirements), and the practical implications for AI governance.

Bartz v. Anthropic — the landmark fair-use ruling and settlement

On 23 June 2025, Judge William Alsup of the Northern District of California issued a summary-judgment ruling in Bartz v. Anthropic that became the most influential US training-data decision to date.[1] Two distinct holdings:

  1. Training on legally acquired books — including books Anthropic had purchased or licensed — constituted transformative fair use under 17 U.S.C. § 107.
  2. Training on pirated copies — specifically the LibGen and PiLiMi datasets that Anthropic had used — was not fair use because the underlying acquisition was infringing.

The class was certified in August 2025. On 5 September 2025, Anthropic announced a $1.5 billion settlement — approximately $3,000 per book for approximately 482,460 works.[2] The settlement is the largest in US copyright history.

Practical lessons from Bartz:

Kadrey v. Meta

On 25 June 2025, Judge Vince Chhabria ruled in Kadrey v. Meta — a parallel case against Meta’s Llama training data.[3] Meta prevailed, but on procedural grounds rather than a broad fair-use holding. The court did not endorse Meta’s fair-use position; it concluded that the plaintiffs had not produced the evidence needed to defeat Meta’s motion. The ruling therefore does not establish a broad fair-use precedent for training on pirated copies; readers should not interpret Kadrey as overturning Bartz.

Other ongoing US cases (status May 2026)

The EU AI Act addresses training-data copyright in two ways:

  1. Article 53(1)© requires GPAI providers to publish a sufficiently detailed summary of the content used for training.
  2. Article 53(1)(d) requires GPAI providers to put in place a policy to comply with Union law on copyright and related rights, in particular to identify and respect text-and-data-mining (TDM) opt-outs reserved under Article 4(3) of Directive (EU) 2019/790 (the CDSM Directive).

The EU GPAI Code of Practice (July 2025) Copyright chapter operationalises these obligations — see Frontier Models for the Code’s structure. Practically, the Code requires:

US state-level training-data laws

What’s still unsettled

Several core questions remain open as of May 2026:

Regardless of how the unsettled questions resolve, the following baseline practices are increasingly expected:

  1. Catalogue training data with provenance, licensing, and acquisition method documented per dataset.
  2. Avoid known pirated corpora (LibGen, PiLiMi, Anna’s Archive, sci-hub) for any production model. The Bartz settlement makes this unambiguous from an enforcement-cost perspective.
  3. Honour TDM opt-outs (robots.txt, ai.txt, Spawning) and publish a clear opt-out endpoint.
  4. Publish training-data summaries to AB 2013 / EU AI Act Article 53 standard.
  5. Maintain a rightholder contact for copyright concerns.
  6. Document a copyright policy as part of the model card.
  7. Implement output filters for known copyrighted content where commercially feasible.
  8. Retain counsel for jurisdiction-specific analysis of any deployed model trained on contested data.

  1. ArentFox Schiff. Landmark Ruling on AI Copyright: Fair Use vs. Infringement in Bartz v. Anthropic. ↩︎

  2. Authors Guild. What Authors Need to Know About the Anthropic Settlement. ↩︎

  3. Kadrey et al. v. Meta Platforms, Inc., N.D. Cal. (2025). ↩︎