Journalism Fights Back – Why the New York Times’ AI Lawsuit Matters

Last Updated January 3, 2024 2:32 PM

By James Morales

The New York Times argues that OpenAI's use of its journalism amounts to copyright infringement.

Key Takeaways

The New York Times is suing OpenAI and Microsoft.
The publisher argues that copyright law should prevent the unauthorized use of its articles as AI training data.
In the last year, content creators of various stripes have brought similar litigation against OpenAI and other AI developers.

In 2023, the world realized the extraordinary power of AI to transform the way some people create content. But what happens when that content looks suspiciously like someone else’s work?

On December 27, the New York Times served a lawsuit against OpenAI, alleging copyright infringement based on outputs generated by ChatGPT. In a clash between the NYT and the Microsoft-backed startup, the case could prove decisive in shaping how intellectual property law responds to AI-generated content.

Can Chatbots Cheat? Generative AI and Plagiarism

According to the complaint, the GPT large language models (LLMs) that power ChatGPT were trained using millions of New York Times articles without the publisher’s consent. The NYT claims ChatGPT outputs often provide users with “near-verbatim” excerpts from articles. These pieces would otherwise require a paid subscription to view.

Naming OpenAI and Microsoft as defendants, the complaint accuses them of seeking to “free-ride on The Times’s massive investment in its journalism”. It also says they are “using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.”

In the New York Times OpenAI lawsuit, you can see how complex the relationship of training data to output can be. On one hand, they find that you can induce ChatGPT to produce exact content from famous Times articles, on the other, they show it also hallucinates false articles. pic.twitter.com/cY7cyZjd8r

— Ethan Mollick (@emollick) December 27, 2023

Historically, a series of legal precedents has helped mark the boundary between intellectual property theft and the fair use of copyrighted materials. If AI-generated content is subject to the same standards, existing case law should normally be sufficient to establish when a given text, image or other form of media has crossed the line into copyright infringement.

In the latest lawsuit, however, the NYT doesn’t object to specific instances of ChatGPT responses plagiarising articles. It does, however, object to the wholesale use of its journalism to train AI models.

The NYT has not made an exact monetary demand. However, it said the defendants should be held responsible for “billions of dollars in statutory and actual damages.” It has also called for the companies to destroy chatbot models and training data that use NYT copyrighted material.

Copyright Law and AI: The Story So Far

Following the generative AI boom of recent years legal systems are wrestling with the technology’s implications for copyright law. In the US, several important cases are currently underway. These should test the limits of AI developers’ use of copyright-protected materials as training data.

Alongside the Times’ lawsuit, OpenAI is facing a number of similar legal challenges. For example, a class action lawsuit organized by the Authors Guild has accused the firm of using works by authors including John Grisham, Jodi Picoult and George R R Martin without their permission. Authors Sarah Silverman, Christopher Golden, and Richard Kadrey have also brought a similar case.

Meanwhile, last year, three visual artists sued Stability AI, DeviantArt and Midjourney. They claimed their artworks were used to train image-generating AI

In the latter case – Anderson vs. Stability AI – all but one of the charges were ultimately struck down. Largely dismissing the notion that an AI model itself could be subject to copyright infringement claims, the judge suggested that only instances of actual unauthorized reproduction could be subject to copyright claims.

“Finding that the Complaint is defective in numerous respects” Judge William Orrick granted the motion to dismiss Anderson’s claims:

1. All 3 defendants were granted the motion to dismiss direct copyright infringement claims.

Anderson is allowed to amend & file again.

— Franklin Graves 🚀 (@franklingraves) October 30, 2023

Why the New York Times Could Be Different

At first glance, the decision doesn’t bode well for the New York Times. The Gray Lady is also seeking an expansive ruling against the unauthorized use of copyrighted materials as AI training data.

However, one key difference between the lawsuits is that the NYT appears to have more evidence that ChatGPT is repeatedly breaching its intellectual property rights.

The publisher may not succeed in forcing OpenAI to decommission existing models. However, proof that ChatGPT has been regurgitating New York Times articles without proper citations should strengthen its negotiating position if the two firms enter settlement talks.

Ultimately, even if attempts to prevent AI developers from using publicly available content to train their models end up failing, they could still transform the landscape of copyright law. This should, hopefully, ensure publishers, artists and other creators get a more equitable share of the value their work generates.

Was this Article helpful? Yes No