103 points
*

In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.

Both filings make a broader case against AI, claiming that by definition, the models are a risk to the Copyright Act because they are trained on huge datasets that contain potentially copyrighted information

They’ve got a point.

If you ask AI to summarize something, it needs to know what it’s summarizing. Reading other summaries might be legal, but then why not just read those summaries first?

If the AI “reads” the work first, then it would have needed to pay for it. And how do you deal with that? Is a chatbot treated like one user? Or does it need to pay for a copy for each human that asks for a summary?

I think if they’d have paid for a single ebbok Library subscription they’d be fine. However the article says they used pirate libraries so it could read anything on the fly.

Pointing an AI at pirated media is going to be hard to defend in court. And a class action full of authors and celebrities isn’t going to be a cakewalk. They’ve got a lot of money to fight, and have lots of contacts for copyright laws. I’m sure all the publishers are pissed too.

Everyone is going after AI money these days, this seems like the rare case where it’s justified

permalink
report
reply
37 points

If the AI “reads” the work first, then it would have needed to pay for it

That’s not actually true. Copyright applies to distribution, not consumption. You violate no law when I create an unauthorized copy of a work, and you read that copy. Copyright law prohibits you from distributing further copies, but it does not prohibit you from possessing the copy I provided you, nor are you prohibited from speaking about the copy you have acquired.

Unless the AI is regurgitating substantial parts of the original work, it’s output is a “transformative derivation”, which is not subject to the protections of the original copyright. The AI is doing what English teachers ask of every school-age child: create a book report.

permalink
report
parent
reply
11 points
*

Copyright applies to distribution, not consumption. You violate no law when I create an unauthorized copy of a work

This is completely untrue. Making any unauthorised copy is an infringement of copyright. Hell, the UK determined that merely loading a pirated game into RAM was unauthorised copying, making the act of playing a pirated game unlawful - thankfully this is ruling only the case in the UK, however the basic principles of copyright are the same all over the world.

When you buy something, you get a limited license to make copies for the purpose of viewing the material. That license does not extend to making backup copies. However, in a practical sense, it is very unlikely you will be prosecuted for most kinds of infringement like this - particularly when no money is involved. It’s still infringement, though.

Edit: I will say though: you violate no law when you view a copy I create. However I would still be infringing for making and showing you the copy.

In the case of making a book report, that is educational, and thus fair use. ChatGPT is not educational - you might use it for education, but ChatGPT’s use of copyrighted material is for commercial enterprise.

permalink
report
parent
reply
10 points

The uploader is the person creating the copy. Downloading is not creating a copy; downloading is receiving a copy.

I would love to see a citation on that UK precedent, but as you said: “thankfully this is only the case in the UK” and does not apply in the rest of the world.

Making any unauthorised copy is an infringement of copyright.

The exceptions to that are so numerous that the statement is closer to false than truth. “Fair Use” blows the absolute nature of that statement out of the water.

There has never been a successful prosecution for downloading only.

permalink
report
parent
reply
8 points

There was still copyright infringement because the company probably downloaded the text (which created another copy) and modified it (alteration is also protected by copyright) before using it as training data. If you write an original novel and admit that you had pirated a bunch of novels to use for reference, those novels were still downloaded illegally even if you’ve deleted them by now. The AI isn’t copyright infringement itself, it’s proof that copyright infringement has happened.

But personally I don’t think the actual laws will matter so much as which side has the better case for why they will lead to more innovation and growth for the economy.

permalink
report
parent
reply
0 points

There was still copyright infringement because the company probably downloaded the text (which created another copy)

Sure, someone likely infringed on copyright for that copy to be created, but the person/entity committing that infringement is the sender, not the receiver. The uploader is the infringing party, not the downloader.

If you write an original novel and admit that you had pirated a bunch of novels to use for reference, those novels were still downloaded illegally even if you’ve deleted them by now.

They were uploaded illegally. The people who distributed those copies to me have infringed on copyright, sure. My receiving those copies does not constitute infringement. Uploading is the illegal act, not downloading.

My work does not violate copyright, unless I use a substantial part of the other works. But, if I used substantial parts of those works, my work would be some sort of “derivation” and not the “original novel” you declared it. (Many types of derivation fall within “fair use” and do not constitute infringement.)

Whether I delete the works or not is entirely irrelevant. I am prohibited from creating and distributing additional copies, but I am not prohibited from receiving, possessing, or consuming an unauthorized copy.

permalink
report
parent
reply
1 point

The US copyright office says this on their website

Uploading or downloading works protected by copyright without the authority of the copyright owner is an infringement of the copyright owner’s exclusive rights of reproduction and/or distribution.

If the company downloaded books without buying them to train their AI, that’s copyright infringement

permalink
report
parent
reply
3 points

The US copyright office says this on their website

Their website has zero legal precedence. It is an oversimplification that does not stand up to scrutiny.

The combined act of transmitting the work from uploader to downloader is infringing, but only the uploader’s actions conflict with copyright law. The downloader’s actions do not.

permalink
report
parent
reply
1 point

They get people torrenting movies by saying you seed while you leach…

So if they torrented them in mass, they broke it.

permalink
report
parent
reply
3 points

Exactly: seeding is uploading, and uploading can be infringement. So, if your torrent client seeded any part of the work to anyone, that could be considered infringement.

But, there is no evidence that ChatGPT received the works in question via torrent, and even if there was, there is no evidence that they actually seeded anything back to the swarm. Hell, there’s no evidence that ChatGPT even actually possesses the works in question.

permalink
report
parent
reply
16 points

Can the sources where ChatGPT got it’s information from be traced? What if it got the information from other summaries?

I think the hardest thing for these companies will be validating the information their AI is using. I can see an encyclopedia-like industry popping up over the next couple years.

Btw I know very little about this topic but I find it fascinating

permalink
report
parent
reply
5 points

Yes! They publish the data sources and where they got everything from. Diffusers (stable diffusion/midjoirny etc) and GPT both use tons of data that was taken in ways that likely violate that data’s usage agreement.

Imo they deserve whatever lawsuits they have coming.

permalink
report
parent
reply
1 point

likely violate that data’s usage agreement.

It doesn’t seem to be too common for books to include specific clauses or EULAs that prohibit their use as data in machine learning systems. I’m curious if there are really any aspects that cover this without it being explicitly mentioned. I guess we’ll find out.

permalink
report
parent
reply
16 points

“It was like this when I got it”

permalink
report
parent
reply
15 points

It depends on if the summary is an infringing derivative work, doesn’t it? Wikipedia is full of summaries, for example, and it’s not violating copyright.

If they illegally downloaded the works, that feels like a standalone issue to me, not having anything to do with AI.

permalink
report
parent
reply
5 points

Wikipedia is a non profit whose primary purpose is education. ChatGPT is a business venture.

permalink
report
parent
reply
5 points

A book review published in a newspaper is a commercial venture for the purpose of selling ads. The commercial aspect doesn’t make the review an infringement.

A summary is a “Transformative Derivation”. It is a related work, created for a fundamentally different purpose. It is a discussion about the work, not a copy of the work. Transformative derivations are not infringements, even where they are specifically intended to be used for commercial purposes.

permalink
report
parent
reply
40 points

I’ve noticed that the lemmy crowd seems more accepting of AI stuff than the Reddit crowd was

permalink
report
reply
75 points
Deleted by creator
permalink
report
parent
reply
15 points

On the flip side, anytime I’ve tried to use it to write python scripts for me, it always seems to get them slightly wrong. Nothing that a little troubleshooting can’t handle, and certainly helps to get me in the ballpark of what I’m looking for, but I think it still has a little ways to go for specific coding use cases.

permalink
report
parent
reply
5 points

I think the key there is that ChatGPT isn’t able to run its own code, so all it can do is generate code which “looks” right, which in practice is close to functional but not quite. In order for the code it writes to reliably work, I think it would need a builtin interpreter/compiler to actually run the code, and for it to iterate constantly making small modifications until the code runs, then return the final result to the user.

permalink
report
parent
reply
3 points
*
Deleted by creator
permalink
report
parent
reply
3 points

It can even deal with basic algebra, it’s awesome. I can’t be fucked to work out this 16-var linear system, or even to write out the sympy to do it.

But guess who is?

permalink
report
parent
reply
2 points

I for one welcome our SkyNet overlords. They can’t be much worse than the current global leaders…

permalink
report
parent
reply
2 points

I always say “please” and “thank you” when using chatGPT. When the AI finally takes over and subsequently and inevitably concludes that the world would be a better place without humans, it may remember that myself specifically was always friendly. Maybe it’ll then have the courtesy to nuke my house directly instead of making me ultimatively succumb to nuclear winter.

permalink
report
parent
reply
2 points

I use ChatGPT to romanize Farsi script from song texts and such. There is no other tool that works even remotely well and the AI somehow knows how to properly transliterate.

permalink
report
parent
reply
1 point

That’s genius! I’ve been trying to figure out how to incorporate ChatGPT-like bots into my work, but haven’t found it to be that useful. I don’t write a lot of regex, but hate it every time I do, so I’ll definitely be trying this next time I need it.

permalink
report
parent
reply
0 points

OMG. Using it for RegEx searches! How had that not even crossed my mind?

I’ve tried learning RegEx basics and using some websites to point me in the right direction when a specific use comes up, but tuning the search string correctly usually takes longer than it’s been worth. Off to ChatGPT it is!

permalink
report
parent
reply
4 points

I’d use it with caution as there are no small mistakes in regex - any can lead to big problems, and ChatGPT does often give wrong or not entirely correct answers.

permalink
report
parent
reply
2 points
*
Deleted by creator
permalink
report
parent
reply
8 points

It’s probably related to the fact that it seems a lot of Lemmy users are in tech, rather than art.

I think generative AI is a great tool, but a lot of people who don’t understand how it works either overestimate (it can do everything and it’s so smart!!) or underestimate it (all it does is steal my work!!)

permalink
report
parent
reply
7 points
*
Deleted by creator
permalink
report
parent
reply
4 points

Personally, I’m a comp sci graduate who did several courses exploring AI, but I actually started out in fine arts and continue to paint, write, and play music to this day. I’m sure I’ll be blending these studies in some way when I move on to my master’s.

I agree that automation is scary. It’s unregulated. But it’s not the tech so much that’s evil, but rather the employers who see it as a reason to get rid of employees. And before, it’d be manual labour that we replaced with machines. People doing mental labour thought they were immune, until now they’re not. Our economic system’s going to need to change in some way.

But generative AI can be very good even for artists. For example, sometimes I suffer from writer’s block (who doesn’t?). Now, I can feed what I’m working on into chatGPT and have it spit out an example of the next paragraph. Sometimes that’s enough to spur me on so I can write the next page.

Artist movements in general are pretty conservative. When digital painting first became a thing, allowing people use layers and filters so easily, the kneejerk reaction by artists was to consider it cheating.

My hope is that in an ideal world, human-made art becomes valuable in the future precisely because it has the human touch. Live music played on real instruments, paintings on canvas, the sorts of things with quirks and imperfections and a human element that can’t be mass produced. Let the corporations have their algorithmic, soulless advertisements, and let the people focus on true self expression.

But then for people without artistic talent, say those who want to make indie games but can’t hire an artist or a musician because they’re just some kid with a dream and little experience? Hell, why not let them generate some assets with AI?

But we need to make sure that people aren’t afraid of becoming homeless, starving on the streets. I think, we’re not getting rid of AI at this point, it’s too powerful, and I don’t have an answer to our societal problems. For better or worse, we’ll adapt.

permalink
report
parent
reply
4 points

Accepting of AI as a concept yes. But we’re not too accepting of the current generation of theft-markov-generators that companies want to try and replace us with.

permalink
report
parent
reply
3 points

They’re a lot more than markov generators, but yeah. I don’t really think, in the long run, we’re going to see too many jobs displaced by AI.

Im not convinced that our statistics based training methods will lead to true iRobot style AGI.

And any company (except maybe visual novel shops) that fires people in favor of AI is going to regret it within 2 years.

permalink
report
parent
reply
2 points

Yeah I’m being facetious when I call them markovs. I’m mainly just saying that they are basically regurgitating copyrighted material based on statistics, so I believe they are just automated copyright violations.

Completely agree with your comment.

permalink
report
parent
reply
0 points

I just think it’s awesome technology and that we shouldn’t be holding it back. AI is pandora’s box, and that box can’t be closed now that it’s open.

All these attempts to restrict it remind me of the old efforts to stop people from taping TV shows with their VCRs.

permalink
report
parent
reply
36 points
*

I like her and I get why creatives are panicking because of all the AI hype.

However:

In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.

A summary is not a copyright infringement. If there is a case for fair-use it’s a summary.

The comic’s suit questions if AI models can function without training themselves on protected works.

A language model does not need to be trained on the text it is supposed to summarize. She clearly does not know what she is talking about.

IANAL though.

permalink
report
reply
25 points

I guess they will get to analyze OpenAI’s dataset during discovery. I bet OpenAI didn’t have authorization to use even 1% of the content they used.

permalink
report
parent
reply
15 points

That’s why they don’t feel they can operate in the EU, as the EU will mandate AI companies to publish what datasets they trained their solutions on.

permalink
report
parent
reply
7 points

Things might change but right now, you simply don’t need anyones authorization.

Hopefully it doesn’t change because only a handful of companies have the data or the funds to buy the data, it would kill any kind of open source or low priced endeavour.

permalink
report
parent
reply
4 points

FWIW, Common Crawl - a free/open-source dataset of crawled internet pages - was used by OpenAI for GPT-2 and GPT-3 as well as EleutherAI’s GPT-NeoX. Maybe on GPT3.5/ChatGPT as well but they’ve been hush about that.

permalink
report
parent
reply
-10 points

SS is such a tool. Does anybody remember the big anti-gay speech that launched her career in The Way of the Gun? She’ll do anything to get ahead.

Here’s the speech: https://www.youtube.com/watch?v=PAl5xGi7urQ

permalink
report
parent
reply
15 points

You hate her because of a part in a shitty movie?

permalink
report
parent
reply
-7 points

Did I say hate? I said she’s a tool.

permalink
report
parent
reply
6 points

Here is an alternative Piped link(s): https://piped.video/watch?v=PAl5xGi7urQ

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source, check me out at GitHub.

permalink
report
parent
reply
1 point

Good piped bot

permalink
report
parent
reply
27 points

I feel like when confronted about a “stolen comedy bit” a lot of these people complaining would also argue that “no work is entirely unique, everyone borrows from what already existed before.” But now they’re all coming out of the woodwork for a payday or something… It’s kinda frustrating especially if they kill any private use too…

permalink
report
reply
24 points

I’m a teacher and the last half of this school year was a comedy of my colleagues trying to “ban” chat GPT. I’m not so much worried about students using chat GPT to do work. A simple two minute conversation with a student who creates an excellent (but suspected) piece of writing will tell you whether they wrote it themselves or not. What worries me is exactly those moments where you’re asking for a summary or a synopsis of something. You really have no idea what data is being used to create that summary.

permalink
report
parent
reply
12 points
*

The issue isn’t that people are using others works for ‘derivative’ content.

The issue is that, for a person to ‘derive’ comedy from Sarah Silverman the ‘analogue’ way, you have to get her works legally, be that streaming her comedy specials, or watching movies/shows she’s written for.

With chat GPT and other AI, its been ‘trained’ on her work (and, presumably as many other’s works as possible) once, and now there’s no ‘views’, or even sources given, to those properties.

And like a lot of digital work, its reach and speed is unprecedented. Like, previously, yeah, of course you could still ‘derive’ from people’s works indirectly, like from a friend that watched it and recounted the ‘good bits’, or through general ‘cultural osmosis’. But that was still limited by the speed of humans, and of culture. With AI, it can happen a functionally infinite number of times, nearly instantly.

Is all that to say Silverman is 100% right here? Probably not. But I do think that, the legality of ChatGPT, and other AI that can ‘copy’ artist’s work, is worth questioning. But its a sticky enough issue that I’m genuinely not sure what the best route is. Certainly, I think current AI writing and image generation ought to be ineligible for commercial use until the issue has at least been addressed.

permalink
report
parent
reply
4 points

The issue is that, for a person to ‘derive’ comedy from Sarah Silverman the ‘analogue’ way, you have to get her works legally, be that streaming her comedy specials, or watching movies/shows she’s written for.

Damn did they already start implanting DRM bio-chips in people?

And like a lot of digital work, its reach and speed is unprecedented. Like, previously, yeah, of course you could still ‘derive’ from people’s works indirectly, like from a friend that watched it and recounted the ‘good bits’, or through general ‘cultural osmosis’.

Please explain why you cannot download a movie/episode/ebook illegally and then directly derive from it.

permalink
report
parent
reply
1 point

Please explain why you cannot download a movie/episode/ebook illegally and then directly derive from it.

The law does not prohibit the receiving of an unauthorized copy. The law prohibits the distribution of the unauthorized copy. It is possible to send/transmit/upload a movie/episode/ebook illegally, but the act of receiving/downloading that unauthorized copy is not prohibited and not illegal.

You can’t illegally download a movie/episode/ebook for the same reason that you can’t illegally park your car in your own garage: there is no law making it illegal.

Even if ChatGPT possesses an unauthorized copy of the work, it would only violate copyright law if it created and distributed a new copy of that work. A summary of the work would be considered a “transformative derivation”, and would fall well within the boundaries of fair-use.

permalink
report
parent
reply
1 point
*

I mean, you can do that, but that’s a crime.

Which is exactly what Sarah Silverman is claiming ChatGPT is doing.

And, beyond a individual crime of a person reading a pirated book, again, we’re talking about ChatGPT and other AI magnifying reach and speed, beyond what an individual person ever could do even if they did nothing but read pirated material all day, not unlike websites like The Pirate Bay. Y’know, how those website constantly get taken down and have to move around the globe to areas where they’re beyond the reach of the law, due to the crimes they’re doing.

I’m not like, anti-piracy or anything. But also, I don’t think companies should be using pirated software, and my big concern about LLMs aren’t really for private use, but for corporate use.

permalink
report
parent
reply
1 point

The issue is that, for a person to ‘derive’ comedy from Sarah Silverman the ‘analogue’ way, you have to get her works legally,

That is not actually true.

I would violate copyright by making an unauthorized copy and providing it to you, but you do not violate copyright for simply viewing that unauthorized copy. Sarah can come after me for creating the cop[y|ies], but she can’t come after the people to whom I send them, even if they admit to having willingly viewed a copy they knew to be unauthorized.

Copyright applies to distribution, not consumption.

permalink
report
parent
reply
0 points

The issue is that, for a person to ‘derive’ comedy from Sarah Silverman the ‘analogue’ way, you have to get her works legally, be that streaming her comedy specials, or watching movies/shows she’s written for.

I can also talk to a guy in a bar rambling about her work. That guy’s name? ChatGPT.

permalink
report
parent
reply
22 points
*
Deleted by creator
permalink
report
reply
-1 points

I know this is kind of a silly argument but storing protected work in our own human memories to recall later is certainly not reproduction.

I don’t think it’s reproduction for chat GPT to file away that information to call on it later. It’s just better at it than we are.

permalink
report
parent
reply

Technology

!technology@lemmy.ml

Create post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

Community stats

  • 3.5K

    Monthly active users

  • 2.9K

    Posts

  • 45K

    Comments

Community moderators