In 2023, the world realized the extraordinary power of AI to transform the way some people create content. But what happens when that content looks suspiciously like someone else’s work?
On December 27, the New York Times served a lawsuit against OpenAI, alleging copyright infringement based on outputs generated by ChatGPT. In a clash between the NYT and the Microsoft-backed startup, the case could prove decisive in shaping how intellectual property law responds to AI-generated content.
According to the complaint, the GPT large language models (LLMs) that power ChatGPT were trained using millions of New York Times articles without the publisher’s consent. The NYT claims ChatGPT outputs often provide users with “near-verbatim” excerpts from articles. These pieces would otherwise require a paid subscription to view.
Naming OpenAI and Microsoft as defendants, the complaint accuses them of seeking to “free-ride on The Times’s massive investment in its journalism”. It also says they are “using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.”
Historically, a series of legal precedents has helped mark the boundary between intellectual property theft and the fair use of copyrighted materials. If AI-generated content is subject to the same standards, existing case law should normally be sufficient to establish when a given text, image or other form of media has crossed the line into copyright infringement.
In the latest lawsuit, however, the NYT doesn’t object to specific instances of ChatGPT responses plagiarising articles. It does, however, object to the wholesale use of its journalism to train AI models.
The NYT has not made an exact monetary demand. However, it said the defendants should be held responsible for “billions of dollars in statutory and actual damages.” It has also called for the companies to destroy chatbot models and training data that use NYT copyrighted material.
Following the generative AI boom of recent years legal systems are wrestling with the technology’s implications for copyright law. In the US, several important cases are currently underway. These should test the limits of AI developers’ use of copyright-protected materials as training data.
Alongside the Times’ lawsuit, OpenAI is facing a number of similar legal challenges. For example, a class action lawsuit organized by the Authors Guild has accused the firm of using works by authors including John Grisham, Jodi Picoult and George R R Martin without their permission. Authors Sarah Silverman, Christopher Golden, and Richard Kadrey have also brought a similar case.
Meanwhile, last year, three visual artists sued Stability AI, DeviantArt and Midjourney. They claimed their artworks were used to train image-generating AI
In the latter case – Anderson vs. Stability AI – all but one of the charges were ultimately struck down. Largely dismissing the notion that an AI model itself could be subject to copyright infringement claims, the judge suggested that only instances of actual unauthorized reproduction could be subject to copyright claims.
At first glance, the decision doesn’t bode well for the New York Times. The Gray Lady is also seeking an expansive ruling against the unauthorized use of copyrighted materials as AI training data.
However, one key difference between the lawsuits is that the NYT appears to have more evidence that ChatGPT is repeatedly breaching its intellectual property rights.
The publisher may not succeed in forcing OpenAI to decommission existing models. However, proof that ChatGPT has been regurgitating New York Times articles without proper citations should strengthen its negotiating position if the two firms enter settlement talks.
Ultimately, even if attempts to prevent AI developers from using publicly available content to train their models end up failing, they could still transform the landscape of copyright law. This should, hopefully, ensure publishers, artists and other creators get a more equitable share of the value their work generates.