I mean, I’m not sure why this conversation even needs to get this far. If I write an article about the history of Disney movies, and make it very clear the way I got all of those movies was to pirate them, this conversation is over pretty quick. OpenAI and most of the LLMs aren’t doing anything different. The Times isn’t Wikipedia, most of their stuff is behind a paywall with pretty clear terms of service and nothing entitles OpenAI to that content. OpenAI’s argument is “well, we’re pirating everything so it’s okay.” The output honestly seems irrelevant to me, they never should have had the content to begin with.
That’s not the claim that they’re making. They’re arguing that OpenAI retains their work they made publicly available, which OpenAI claims is fair use because it’s wholly transformative in the form of nodes, weights and biases, and that they don’t store those articles in a database for reuse. But their other argument is that they created a system that threatens their business which is just ludicrous.