In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.
Both filings make a broader case against AI, claiming that by definition, the models are a risk to the Copyright Act because they are trained on huge datasets that contain potentially copyrighted information
They’ve got a point.
If you ask AI to summarize something, it needs to know what it’s summarizing. Reading other summaries might be legal, but then why not just read those summaries first?
If the AI “reads” the work first, then it would have needed to pay for it. And how do you deal with that? Is a chatbot treated like one user? Or does it need to pay for a copy for each human that asks for a summary?
I think if they’d have paid for a single ebbok Library subscription they’d be fine. However the article says they used pirate libraries so it could read anything on the fly.
Pointing an AI at pirated media is going to be hard to defend in court. And a class action full of authors and celebrities isn’t going to be a cakewalk. They’ve got a lot of money to fight, and have lots of contacts for copyright laws. I’m sure all the publishers are pissed too.
Everyone is going after AI money these days, this seems like the rare case where it’s justified
If the AI “reads” the work first, then it would have needed to pay for it
That’s not actually true. Copyright applies to distribution, not consumption. You violate no law when I create an unauthorized copy of a work, and you read that copy. Copyright law prohibits you from distributing further copies, but it does not prohibit you from possessing the copy I provided you, nor are you prohibited from speaking about the copy you have acquired.
Unless the AI is regurgitating substantial parts of the original work, it’s output is a “transformative derivation”, which is not subject to the protections of the original copyright. The AI is doing what English teachers ask of every school-age child: create a book report.
Copyright applies to distribution, not consumption. You violate no law when I create an unauthorized copy of a work
This is completely untrue. Making any unauthorised copy is an infringement of copyright. Hell, the UK determined that merely loading a pirated game into RAM was unauthorised copying, making the act of playing a pirated game unlawful - thankfully this is ruling only the case in the UK, however the basic principles of copyright are the same all over the world.
When you buy something, you get a limited license to make copies for the purpose of viewing the material. That license does not extend to making backup copies. However, in a practical sense, it is very unlikely you will be prosecuted for most kinds of infringement like this - particularly when no money is involved. It’s still infringement, though.
Edit: I will say though: you violate no law when you view a copy I create. However I would still be infringing for making and showing you the copy.
In the case of making a book report, that is educational, and thus fair use. ChatGPT is not educational - you might use it for education, but ChatGPT’s use of copyrighted material is for commercial enterprise.
The uploader is the person creating the copy. Downloading is not creating a copy; downloading is receiving a copy.
I would love to see a citation on that UK precedent, but as you said: “thankfully this is only the case in the UK” and does not apply in the rest of the world.
Making any unauthorised copy is an infringement of copyright.
The exceptions to that are so numerous that the statement is closer to false than truth. “Fair Use” blows the absolute nature of that statement out of the water.
There has never been a successful prosecution for downloading only.
There was still copyright infringement because the company probably downloaded the text (which created another copy) and modified it (alteration is also protected by copyright) before using it as training data. If you write an original novel and admit that you had pirated a bunch of novels to use for reference, those novels were still downloaded illegally even if you’ve deleted them by now. The AI isn’t copyright infringement itself, it’s proof that copyright infringement has happened.
But personally I don’t think the actual laws will matter so much as which side has the better case for why they will lead to more innovation and growth for the economy.
There was still copyright infringement because the company probably downloaded the text (which created another copy)
Sure, someone likely infringed on copyright for that copy to be created, but the person/entity committing that infringement is the sender, not the receiver. The uploader is the infringing party, not the downloader.
If you write an original novel and admit that you had pirated a bunch of novels to use for reference, those novels were still downloaded illegally even if you’ve deleted them by now.
They were uploaded illegally. The people who distributed those copies to me have infringed on copyright, sure. My receiving those copies does not constitute infringement. Uploading is the illegal act, not downloading.
My work does not violate copyright, unless I use a substantial part of the other works. But, if I used substantial parts of those works, my work would be some sort of “derivation” and not the “original novel” you declared it. (Many types of derivation fall within “fair use” and do not constitute infringement.)
Whether I delete the works or not is entirely irrelevant. I am prohibited from creating and distributing additional copies, but I am not prohibited from receiving, possessing, or consuming an unauthorized copy.
The US copyright office says this on their website
Uploading or downloading works protected by copyright without the authority of the copyright owner is an infringement of the copyright owner’s exclusive rights of reproduction and/or distribution.
If the company downloaded books without buying them to train their AI, that’s copyright infringement
The US copyright office says this on their website
Their website has zero legal precedence. It is an oversimplification that does not stand up to scrutiny.
The combined act of transmitting the work from uploader to downloader is infringing, but only the uploader’s actions conflict with copyright law. The downloader’s actions do not.
They get people torrenting movies by saying you seed while you leach…
So if they torrented them in mass, they broke it.
Exactly: seeding is uploading, and uploading can be infringement. So, if your torrent client seeded any part of the work to anyone, that could be considered infringement.
But, there is no evidence that ChatGPT received the works in question via torrent, and even if there was, there is no evidence that they actually seeded anything back to the swarm. Hell, there’s no evidence that ChatGPT even actually possesses the works in question.
Can the sources where ChatGPT got it’s information from be traced? What if it got the information from other summaries?
I think the hardest thing for these companies will be validating the information their AI is using. I can see an encyclopedia-like industry popping up over the next couple years.
Btw I know very little about this topic but I find it fascinating
Yes! They publish the data sources and where they got everything from. Diffusers (stable diffusion/midjoirny etc) and GPT both use tons of data that was taken in ways that likely violate that data’s usage agreement.
Imo they deserve whatever lawsuits they have coming.
likely violate that data’s usage agreement.
It doesn’t seem to be too common for books to include specific clauses or EULAs that prohibit their use as data in machine learning systems. I’m curious if there are really any aspects that cover this without it being explicitly mentioned. I guess we’ll find out.
It depends on if the summary is an infringing derivative work, doesn’t it? Wikipedia is full of summaries, for example, and it’s not violating copyright.
If they illegally downloaded the works, that feels like a standalone issue to me, not having anything to do with AI.
Wikipedia is a non profit whose primary purpose is education. ChatGPT is a business venture.
A book review published in a newspaper is a commercial venture for the purpose of selling ads. The commercial aspect doesn’t make the review an infringement.
A summary is a “Transformative Derivation”. It is a related work, created for a fundamentally different purpose. It is a discussion about the work, not a copy of the work. Transformative derivations are not infringements, even where they are specifically intended to be used for commercial purposes.
I’ve noticed that the lemmy crowd seems more accepting of AI stuff than the Reddit crowd was
On the flip side, anytime I’ve tried to use it to write python scripts for me, it always seems to get them slightly wrong. Nothing that a little troubleshooting can’t handle, and certainly helps to get me in the ballpark of what I’m looking for, but I think it still has a little ways to go for specific coding use cases.
I think the key there is that ChatGPT isn’t able to run its own code, so all it can do is generate code which “looks” right, which in practice is close to functional but not quite. In order for the code it writes to reliably work, I think it would need a builtin interpreter/compiler to actually run the code, and for it to iterate constantly making small modifications until the code runs, then return the final result to the user.
I for one welcome our SkyNet overlords. They can’t be much worse than the current global leaders…
I always say “please” and “thank you” when using chatGPT. When the AI finally takes over and subsequently and inevitably concludes that the world would be a better place without humans, it may remember that myself specifically was always friendly. Maybe it’ll then have the courtesy to nuke my house directly instead of making me ultimatively succumb to nuclear winter.
OMG. Using it for RegEx searches! How had that not even crossed my mind?
I’ve tried learning RegEx basics and using some websites to point me in the right direction when a specific use comes up, but tuning the search string correctly usually takes longer than it’s been worth. Off to ChatGPT it is!
It’s probably related to the fact that it seems a lot of Lemmy users are in tech, rather than art.
I think generative AI is a great tool, but a lot of people who don’t understand how it works either overestimate (it can do everything and it’s so smart!!) or underestimate it (all it does is steal my work!!)
Personally, I’m a comp sci graduate who did several courses exploring AI, but I actually started out in fine arts and continue to paint, write, and play music to this day. I’m sure I’ll be blending these studies in some way when I move on to my master’s.
I agree that automation is scary. It’s unregulated. But it’s not the tech so much that’s evil, but rather the employers who see it as a reason to get rid of employees. And before, it’d be manual labour that we replaced with machines. People doing mental labour thought they were immune, until now they’re not. Our economic system’s going to need to change in some way.
But generative AI can be very good even for artists. For example, sometimes I suffer from writer’s block (who doesn’t?). Now, I can feed what I’m working on into chatGPT and have it spit out an example of the next paragraph. Sometimes that’s enough to spur me on so I can write the next page.
Artist movements in general are pretty conservative. When digital painting first became a thing, allowing people use layers and filters so easily, the kneejerk reaction by artists was to consider it cheating.
My hope is that in an ideal world, human-made art becomes valuable in the future precisely because it has the human touch. Live music played on real instruments, paintings on canvas, the sorts of things with quirks and imperfections and a human element that can’t be mass produced. Let the corporations have their algorithmic, soulless advertisements, and let the people focus on true self expression.
But then for people without artistic talent, say those who want to make indie games but can’t hire an artist or a musician because they’re just some kid with a dream and little experience? Hell, why not let them generate some assets with AI?
But we need to make sure that people aren’t afraid of becoming homeless, starving on the streets. I think, we’re not getting rid of AI at this point, it’s too powerful, and I don’t have an answer to our societal problems. For better or worse, we’ll adapt.
Accepting of AI as a concept yes. But we’re not too accepting of the current generation of theft-markov-generators that companies want to try and replace us with.
They’re a lot more than markov generators, but yeah. I don’t really think, in the long run, we’re going to see too many jobs displaced by AI.
Im not convinced that our statistics based training methods will lead to true iRobot style AGI.
And any company (except maybe visual novel shops) that fires people in favor of AI is going to regret it within 2 years.
I like her and I get why creatives are panicking because of all the AI hype.
However:
In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.
A summary is not a copyright infringement. If there is a case for fair-use it’s a summary.
The comic’s suit questions if AI models can function without training themselves on protected works.
A language model does not need to be trained on the text it is supposed to summarize. She clearly does not know what she is talking about.
IANAL though.
I guess they will get to analyze OpenAI’s dataset during discovery. I bet OpenAI didn’t have authorization to use even 1% of the content they used.
Things might change but right now, you simply don’t need anyones authorization.
Hopefully it doesn’t change because only a handful of companies have the data or the funds to buy the data, it would kill any kind of open source or low priced endeavour.
SS is such a tool. Does anybody remember the big anti-gay speech that launched her career in The Way of the Gun? She’ll do anything to get ahead.
Here’s the speech: https://www.youtube.com/watch?v=PAl5xGi7urQ
Here is an alternative Piped link(s): https://piped.video/watch?v=PAl5xGi7urQ
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I’m open-source, check me out at GitHub.
I feel like when confronted about a “stolen comedy bit” a lot of these people complaining would also argue that “no work is entirely unique, everyone borrows from what already existed before.” But now they’re all coming out of the woodwork for a payday or something… It’s kinda frustrating especially if they kill any private use too…
I’m a teacher and the last half of this school year was a comedy of my colleagues trying to “ban” chat GPT. I’m not so much worried about students using chat GPT to do work. A simple two minute conversation with a student who creates an excellent (but suspected) piece of writing will tell you whether they wrote it themselves or not. What worries me is exactly those moments where you’re asking for a summary or a synopsis of something. You really have no idea what data is being used to create that summary.
The issue isn’t that people are using others works for ‘derivative’ content.
The issue is that, for a person to ‘derive’ comedy from Sarah Silverman the ‘analogue’ way, you have to get her works legally, be that streaming her comedy specials, or watching movies/shows she’s written for.
With chat GPT and other AI, its been ‘trained’ on her work (and, presumably as many other’s works as possible) once, and now there’s no ‘views’, or even sources given, to those properties.
And like a lot of digital work, its reach and speed is unprecedented. Like, previously, yeah, of course you could still ‘derive’ from people’s works indirectly, like from a friend that watched it and recounted the ‘good bits’, or through general ‘cultural osmosis’. But that was still limited by the speed of humans, and of culture. With AI, it can happen a functionally infinite number of times, nearly instantly.
Is all that to say Silverman is 100% right here? Probably not. But I do think that, the legality of ChatGPT, and other AI that can ‘copy’ artist’s work, is worth questioning. But its a sticky enough issue that I’m genuinely not sure what the best route is. Certainly, I think current AI writing and image generation ought to be ineligible for commercial use until the issue has at least been addressed.
The issue is that, for a person to ‘derive’ comedy from Sarah Silverman the ‘analogue’ way, you have to get her works legally, be that streaming her comedy specials, or watching movies/shows she’s written for.
Damn did they already start implanting DRM bio-chips in people?
And like a lot of digital work, its reach and speed is unprecedented. Like, previously, yeah, of course you could still ‘derive’ from people’s works indirectly, like from a friend that watched it and recounted the ‘good bits’, or through general ‘cultural osmosis’.
Please explain why you cannot download a movie/episode/ebook illegally and then directly derive from it.
Please explain why you cannot download a movie/episode/ebook illegally and then directly derive from it.
The law does not prohibit the receiving of an unauthorized copy. The law prohibits the distribution of the unauthorized copy. It is possible to send/transmit/upload a movie/episode/ebook illegally, but the act of receiving/downloading that unauthorized copy is not prohibited and not illegal.
You can’t illegally download a movie/episode/ebook for the same reason that you can’t illegally park your car in your own garage: there is no law making it illegal.
Even if ChatGPT possesses an unauthorized copy of the work, it would only violate copyright law if it created and distributed a new copy of that work. A summary of the work would be considered a “transformative derivation”, and would fall well within the boundaries of fair-use.
I mean, you can do that, but that’s a crime.
Which is exactly what Sarah Silverman is claiming ChatGPT is doing.
And, beyond a individual crime of a person reading a pirated book, again, we’re talking about ChatGPT and other AI magnifying reach and speed, beyond what an individual person ever could do even if they did nothing but read pirated material all day, not unlike websites like The Pirate Bay. Y’know, how those website constantly get taken down and have to move around the globe to areas where they’re beyond the reach of the law, due to the crimes they’re doing.
I’m not like, anti-piracy or anything. But also, I don’t think companies should be using pirated software, and my big concern about LLMs aren’t really for private use, but for corporate use.
The issue is that, for a person to ‘derive’ comedy from Sarah Silverman the ‘analogue’ way, you have to get her works legally,
That is not actually true.
I would violate copyright by making an unauthorized copy and providing it to you, but you do not violate copyright for simply viewing that unauthorized copy. Sarah can come after me for creating the cop[y|ies], but she can’t come after the people to whom I send them, even if they admit to having willingly viewed a copy they knew to be unauthorized.
Copyright applies to distribution, not consumption.
The issue is that, for a person to ‘derive’ comedy from Sarah Silverman the ‘analogue’ way, you have to get her works legally, be that streaming her comedy specials, or watching movies/shows she’s written for.
I can also talk to a guy in a bar rambling about her work. That guy’s name? ChatGPT.