Key Takeaways
The legal battle between The New York Times and OpenAI has taken a new twist after the publisher accused the defendant of accidentally destroying evidence.
Specifically, the Times said OpenAI engineers deleted information it had collected while reviewing ChatGPT training data in a court-ordered “sandbox” environment.
For the discovery phase of OpenAI vs.The New York Times, the court ordered OpenAI to create a virtual test environment where the plaintiffs can search through its training data for instances of copyrighted material.
The sandbox was a compromise designed to let the Times lawyers identify when copyrighted articles had been used without OpenAI having to hand over entire training datasets.
Given the size of the dataset and the vast number of articles that may be included in it (including Times articles published elsewhere), lawyers for the plaintiffs said they spent over 150 hours combing through the training data in the first two weeks of November—work they have had to repeat.
According to a letter submitted to the court:
“On November 14, all of News Plaintiffs’ programs and search result data stored on one of the dedicated virtual machines was erased by OpenAI engineers.”
Although OpenAI managed to recover some data, the letter said the folder structure and file names were irretrievably lost,” a major setback for the investigation.
The lawyers further allege that OpenAI has been uncooperative in performing requested searches or providing timely updates on progress.
While they insisted they “have no reason to believe [the erasure] was intentional,” the Times’ legal team is clearly frustrated.
The incident further underscores concerns about OpenAI’s lack of transparency and raises questions about its handling of evidence.
Whether OpenAI intended to delete files or not, tampering with evidence is never a good look for a defendant.
Courts often scrutinize patterns of non-cooperation. If a judge perceives OpenAI’s actions as obstructive, sanctions could follow, potentially setting a precedent for how AI companies handle copyright challenges in training data.
As a landmark case in the emerging field of AI copyright law, the legal battle between OpenAI and the New York Times could have important consequences for other ongoing litigation. Precedents established now could set the tone for years to come.