In early 2023, the US Copyright Office (CO) initiated an examination of copyright law and policy issues raised by artificial intelligence (AI), including the scope of copyright in AI-generated works and the use of copyrighted materials in AI training. Since then, the CO has issued the first two installments of a three-part report: part one on digital replicas, and part two on copyrightability.
On May 9, 2025, the CO released a pre-publication version of the third and final part of its report on Generative AI (GenAI) training. The report addresses stakeholder concerns and offers the CO’s interpretation of copyright’s fair use doctrine in the context of GenAI.
GenAI training involves using algorithms to train models on large datasets to generate new content. This process allows models to learn patterns and structures from existing data and then create new text, images, audio, or other forms of content. The use of copyrighted materials to train GenAI models raises complex copyright issues, particularly issues arising under the “fair use” doctrine. The key question is whether using copyrighted works to train AI without explicit permission from the rights holders is fair use and therefore not an infringement or whether such use violates copyright.
The 107-page report provides a thorough technical and legal overview and takes a carefully calculated approach responding to the legal issues underlying fair use in GenAI. The report suggests that each case is context specific and requires a thorough evaluation of the four factors outlined in Section 107 of the Copyright Act:
- The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes
- The nature of the copyrighted work
- The amount and substantiality of the portion used in relation to the copyrighted work as a whole
- The effect of the use upon the potential market for or value of the copyrighted work.
With regard to the first factor, the report concludes that GenAI training run on large diverse datasets “will often be transformative.” However, the use of copyright-protected materials for AI model training alone is insufficient to justify fair use. The report states that “transformativeness is a matter of degree of the model and how it is deployed.”
The report notes that training a model is most transformative where “the purpose is to deploy it for research, or in a closed system that constrains it to a non-substitutive task,” as opposed to instances where the AI output closely tracks the creative intent of the input (e.g., generating art, music, or writing in a similar style or substance to the original source materials).
As to the second factor (commercial nature of the use), the report notes that a GenAI model is often the product of efforts undertaken by distinct and multiple actors, some of which are commercial entities and some of which are not, and that it is typically difficult to discern attribution and definitively determine that a model is the product of a commercial or a noncommercial [...]
Continue Reading
read more