Copyright & IP
Last reviewed: 2026-05-11AI training-data copyright is the most economically significant legal question facing AI developers in 2026. The largest US copyright settlement in history — Bartz v. Anthropic for $1.5 billion — resolved one major case in September 2025, but the underlying legal questions remain unsettled across jurisdictions. This chapter surveys the cases, the statutory developments (notably California AB 2013 and the EU AI Act’s copyright requirements), and the practical implications for AI governance.
Bartz v. Anthropic — the landmark fair-use ruling and settlement
On 23 June 2025, Judge William Alsup of the Northern District of California issued a summary-judgment ruling in Bartz v. Anthropic that became the most influential US training-data decision to date.[1] Two distinct holdings:
- Training on legally acquired books — including books Anthropic had purchased or licensed — constituted transformative fair use under 17 U.S.C. § 107.
- Training on pirated copies — specifically the LibGen and PiLiMi datasets that Anthropic had used — was not fair use because the underlying acquisition was infringing.
The class was certified in August 2025. On 5 September 2025, Anthropic announced a $1.5 billion settlement — approximately $3,000 per book for approximately 482,460 works.[2] The settlement is the largest in US copyright history.
Practical lessons from Bartz:
- Acquisition matters. A “fair use” defence is materially weaker when the underlying acquisition is unlawful. Documentary evidence that training corpora were licensed or otherwise lawfully obtained is now a baseline expectation.
- Transformativeness still holds for legal acquisition. The transformative-use analysis remains favourable to AI training; Bartz did not narrow the prior line of cases (e.g., Authors Guild v. Google on book scanning).
- Settlement, not Supreme Court. Because Bartz settled, it does not establish binding precedent at the appellate level. Other district courts may follow it; appellate courts have not yet weighed in.
Kadrey v. Meta
On 25 June 2025, Judge Vince Chhabria ruled in Kadrey v. Meta — a parallel case against Meta’s Llama training data.[3] Meta prevailed, but on procedural grounds rather than a broad fair-use holding. The court did not endorse Meta’s fair-use position; it concluded that the plaintiffs had not produced the evidence needed to defeat Meta’s motion. The ruling therefore does not establish a broad fair-use precedent for training on pirated copies; readers should not interpret Kadrey as overturning Bartz.
Other ongoing US cases (status May 2026)
- The New York Times v. OpenAI, S.D.N.Y. — ongoing; discovery extended through 2026.
- Andersen v. Stability AI, N.D. Cal. — image generation case; partial dismissal earlier, claims continue.
- Getty Images v. Stability AI, S.D.N.Y. and UK High Court — parallel actions in two jurisdictions; UK trial commenced 2025.
- Concord Music v. Anthropic, M.D. Tenn. — music-lyrics case, distinct from Bartz.
- Multiple class actions against OpenAI, Microsoft, Google, and Meta at various procedural stages.
EU AI Act copyright obligations
The EU AI Act addresses training-data copyright in two ways:
- Article 53(1)© requires GPAI providers to publish a sufficiently detailed summary of the content used for training.
- Article 53(1)(d) requires GPAI providers to put in place a policy to comply with Union law on copyright and related rights, in particular to identify and respect text-and-data-mining (TDM) opt-outs reserved under Article 4(3) of Directive (EU) 2019/790 (the CDSM Directive).
The EU GPAI Code of Practice (July 2025) Copyright chapter operationalises these obligations — see Frontier Models for the Code’s structure. Practically, the Code requires:
- A documented copyright policy.
- A mechanism for honouring TDM reservations (e.g., robots.txt-based, ai.txt, content provenance signals).
- A point of contact for rightholders.
- Public-facing transparency about copyright compliance.
US state-level training-data laws
- California AB 2013 (effective 1 January 2026) — requires GenAI developers to publish a summary of training-data sources, types, ownership, presence of personal information, and presence of copyrighted material.
- Several states have considered similar transparency bills during 2025-2026; AB 2013 is the most concrete model in force.
What’s still unsettled
Several core questions remain open as of May 2026:
- Memorisation and output infringement. When a model can reproduce significant portions of its training data on demand, courts are wrestling with whether that constitutes direct copying. The Supreme Court has not addressed this; results in district courts have varied.
- Derivative-works analysis for outputs. When a generated output resembles a copyrighted work, the analysis (substantial similarity, access) is unsettled in the AI context.
- DMCA Section 1202(b) — removal of copyright management information — is increasingly tested in training-data cases.
- International jurisdiction. Where models are trained in one jurisdiction and deployed in another with different copyright rules, conflict-of-laws questions arise.
- AI-generated works’ copyrightability. The US Copyright Office position (most recently the March 2025 Part 2 Report) is that purely AI-generated works are not copyrightable; the precise threshold of human contribution required is being litigated.
Practical AI governance steps on copyright
Regardless of how the unsettled questions resolve, the following baseline practices are increasingly expected:
- Catalogue training data with provenance, licensing, and acquisition method documented per dataset.
- Avoid known pirated corpora (LibGen, PiLiMi, Anna’s Archive, sci-hub) for any production model. The Bartz settlement makes this unambiguous from an enforcement-cost perspective.
- Honour TDM opt-outs (robots.txt, ai.txt, Spawning) and publish a clear opt-out endpoint.
- Publish training-data summaries to AB 2013 / EU AI Act Article 53 standard.
- Maintain a rightholder contact for copyright concerns.
- Document a copyright policy as part of the model card.
- Implement output filters for known copyrighted content where commercially feasible.
- Retain counsel for jurisdiction-specific analysis of any deployed model trained on contested data.
ArentFox Schiff. Landmark Ruling on AI Copyright: Fair Use vs. Infringement in Bartz v. Anthropic. ↩︎
Authors Guild. What Authors Need to Know About the Anthropic Settlement. ↩︎
Kadrey et al. v. Meta Platforms, Inc., N.D. Cal. (2025). ↩︎