OpenAI Faces Backlash After Deleting Key Evidence in NY Times Lawsuit – Shocking Developments Unfold!
2024-11-23
Author: Jia
Introduction
OpenAI is currently embroiled in a legal battle with The New York Times and Daily News, which have accused the tech giant of scraping their content to train its AI models without consent. In a troubling turn of events, lawyers representing the publishers have revealed that OpenAI engineers mistakenly deleted crucial data that could have been pivotal in the case.
Background Information
Earlier this fall, OpenAI had agreed to provide access to two virtual machines, allowing the legal teams of The Times and Daily News to conduct searches for their copyrighted materials within OpenAI's AI training datasets. These virtual machines operate as separate software-based computers, enabling various functions such as testing and data management.
The Incident
Since November 1, the publishers’ legal teams and experts have dedicated over 150 hours combing through OpenAI’s training data. However, on November 14, they discovered that all search data on one of the virtual machines had been inadvertently erased. This shocking revelation was documented in a recent letter filed in the U.S. District Court for the Southern District of New York.
Legal Implications
While OpenAI attempted to recover the lost data and had some success, the restoration left the folder structure and file names permanently altered, rendering the recovered data unusable for pinpointing where the plaintiffs' articles contributed to the training of OpenAI’s models. The legal representatives for The Times and Daily News lamented, 'News plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time,' indicating a serious setback for their case.
Plaintiffs' Position
Importantly, the plaintiffs clarified that they do not suspect the deletion was intentional. Nevertheless, they emphasized that the incident highlights OpenAI's unique position to investigate its datasets for possible copyright infringements.
OpenAI's Response
In response, an OpenAI spokesperson has chosen not to comment publicly. However, on November 22, OpenAI’s legal counsel filed their rebuttal, categorically denying any wrongdoing on their part. They attributed the deletion to a configuration mistake requested by the plaintiffs, asserting that the real issue stemmed from a change that affected the temporary cache of the data, rather than a deliberate deletion of evidence.
Fair Use Argument
OpenAI claims that utilizing publicly available data for model training—such as articles from The Times and Daily News—falls under fair use. This controversial stance suggests that OpenAI believes it doesn't need to seek licenses or financial compensation for the data it leverages, even as it derives revenue from its AI models.
Recent Developments
Yet, in a bid to navigate these murky waters, OpenAI has entered into licensing agreements with an increasing number of publishers, including well-known names like the Associated Press and Financial Times. Although the financial specifics of these contracts remain undisclosed, one partnership reportedly guarantees Dotdash Meredith, the parent company of People magazine, at least $16 million annually.
Conclusion
OpenAI has consistently refrained from confirming or denying whether it trained its AI with any particular copyrighted content without permission, leaving many questions unanswered. Stay tuned as this story develops! Will the New York Times and Daily News secure a victory, or will OpenAI’s defense prove too formidable?