Training Generative AI Models on Copyrighted Works Is Fair Use

[ - ]

59 points

9 months ago

I think we should have a rule that says if a LLM company invokes fair use on the training inputs then the outputs are public domain.

permalink

report

reply

[ - ]

Steve@communick.news

26 points

9 months ago

*

That’s already been ruled on once.

A recent lawsuit challenged the human-authorship requirement in the context of works purportedly “authored” by AI. In June 2022, Stephen Thaler sued the Copyright Office for denying his application to register a visual artwork that he claims was authored “autonomously” by an AI program called the Creativity Machine. Dr. Thaler argued that human authorship is not required by the Copyright Act. On August 18, 2023, a federal district court granted summary judgment in favor of the Copyright Office. The court held that “human authorship is an essential part of a valid copyright claim,” reasoning that only human authors need copyright as an incentive to create works. Dr. Thaler has stated that he plans to appeal the decision.

Why would companies care about copyright of the output? The value is in the tool to create it. The whole issue to me revolves around the AI company profiting on it’s service. A service built on a massive library of copyrighted works. It seems clear to me, a large portion of their revenue should go equally to the owners of the works in their database.

permalink

report

parent

reply

[ - ]

Even_Adder@lemmy.dbzer0.com

12 points

9 months ago

You can still copyright AI works, you just can’t name an AI as the author.

permalink

report

parent

reply

[ - ]

Steve@communick.news

9 points

9 months ago

That’s just saying you can claim copyright if you lie about authorship. The problem then is, you may step into the realm of fraud.

report

reply

[ - ]

1 point

9 months ago

Not just the outputs but the models as well

permalink

report

parent

reply

[ - ]

kromem@lemmy.world

1 point

9 months ago

The outputs are not copyrightable.

But something not being copyrightable doesn’t necessarily mean openly distributed.

It does mean OpenAI can’t really restrict or go after other companies training off of GPT-4 outputs though, which is occurring broadly.

permalink

report

parent

reply

[ - ]

NevermindNoMind@lemmy.world

35 points

9 months ago

Google scanned millions of books and made them available online. Courts ruled that was fair use because the purpose and interface didn’t lend itself to actually reading the books in Google books, but just searching them for information. If that is fair use, then I don’t see how training an LLM (which doesn’t retain the exact copy of the training data at least in the vast majority of cases) isn’t fair use. You aren’t going to get an argument from me.

I think most people who will disagree are reflexively anti AI, and that’s fine. But I just haven’t heard a good argument that AI training isn’t fair use.

permalink

report

reply

[ - ]

commie@lemmy.dbzer0.comOP

5 points

9 months ago

here’s a sidechannel attack on your position: every use, even infringing uses, are fair use until adjudicated, because what fair use means is that a court has agreed that your infringing use is allowed. so of course ai training (broadly) is always fair use. but particular instances of ai training may be found to not be fair use, and so we can’t be sure that you are always going to be right (for the specific ai models that may come into question legally).

permalink

report

parent

reply

[ - ]

Semperverus@lemmy.world

10 points

9 months ago

“Its perfectly legal unless you get caught!”

permalink

report

parent

reply

[ - ]

Daxtron2@startrek.website

0 points

9 months ago

*

Considering most copyright cases come down to the individual judge’s decision, essentially yes

permalink

report

parent

reply

[ - ]

runefehay@kbin.social

3 points

9 months ago

I am no lawyer, but I suspect what will be considered either fair use or infringing will probably depend on how the programmed AI model is used.

For example, if you train it on a book of poetry, asking it questions about the poetry will probably be considered fair use. If you ask the AI to write poetry in the style of the book’s poems and you publish the AI’s poetry, I suspect it might be considered laundering copyright and infringing. Especially if it is substantially similar to specific poems in the book.

permalink

report

parent

reply

[ - ]

commie@lemmy.dbzer0.comOP

11 points

9 months ago

If you ask the AI to write poetry in the style of the book’s poems and you publish the AI’s poetry, I suspect it might be considered laundering copyright and infringing.

is the image of a cabin in a snowy landscape copyrighted by Thomas kinkade? fuck no. That’s an idea. ideas can’t be copyrighted. a style isn’t a discreet work. it is an idea. it can’t be copyrighted. if I produce something in the style of Keats or Stephen King or Rowling, they can’t sue me for copyright unless I make a substantially infringing use of their work. The style isn’t sufficient, because the style can’t be copyrighted.

permalink

report

parent

reply

[ - ]

snooggums@kbin.social

14 points

9 months ago

Selling an AI model (or usage of that model) that allows for producing works that are clearly based upon those copyrighted works and would be considered copyright infringement if a person did the same thing is not fair use.

If a person creating the same thing as generative AI would be infringing, then it isn’t magically not infringing because it is on the internet or done by a program. Basically, AI needs to follow the same rules and restrictions as a person would. That does mean that the AI also needs to be trained to not create copyright infringing works if the use of the AI is being sold.

As a downloadable model that anyone can use at no cost? Sure, whatever is fine. Then it is on the person who uses it and tries to infringe. But if someone pays a company to use their AI to create infringing work, that is on the company and they are just as at fault as if they sold T shirts that infringed on copyright.

permalink

report

reply

[ - ]

commie@lemmy.dbzer0.comOP

-12 points

9 months ago

If a person creating the same thing as generative AI would be infringing, then it isn’t magically not infringing because it is on the internet or done by a program

no one is arguing otherwise.

permalink

report

parent

reply

[ - ]

commie@lemmy.dbzer0.comOP

-15 points

9 months ago

That does mean that the AI also needs to be trained to not create copyright infringing works if the use of the AI is being sold.

no it doesn’t.

permalink

report

parent

reply

[ - ]

commie@lemmy.dbzer0.comOP

-17 points

9 months ago

if someone pays a company to use their AI to create infringing work, that is on the company and they are just as at fault as if they sold T shirts that infringed on copyright.

wrong.

permalink

report

parent

reply

[ - ]

commie@lemmy.dbzer0.comOP

-18 points

9 months ago

Selling an AI model (or usage of that model) that allows for producing works that are clearly based upon those copyrighted works and would be considered copyright infringement if a person did the same thing is not fair use

it is.

permalink

report

parent

reply

[ - ]

Aatube@kbin.social

26 points

9 months ago

*

I think you might want to elaborate

instead of making 4 replies in 3 minutes
each averaging
2.75 words

permalink

report

parent

reply

[ - ]

commie@lemmy.dbzer0.comOP

7 points

9 months ago

I don’t see how selling a model or the use of a model infringes on a specific copyright. whose copyright has been infringed? how can you prove that? take AI out of the question. if you wanted to prove that some other author has infringed the copyright on your novel, how would you do that? if you want to prove that some quote unquote artist has infringed on your copyright, how would you do that? if any of your methods for proving that a person has infringed on your copyright is applicable to an AI, then that’s what that is. but if you can’t prove it, if the AI just learned about how style works, if an AI just saw your work but never actually copied it, then it’s not infringing.

permalink

report

parent

reply

[ - ]

commie@lemmy.dbzer0.comOP

-18 points

9 months ago

*

instead of making 4 replies in 3 minutes

each averaging

2.75 words

this is irrelevant to the truth of my claim.

permalink

report

parent

reply

Show more comments

[ - ]

snooggums@kbin.social

8 points

9 months ago

I’m sorry, are you saying that selling a book that has the same characters as a recently released book doing the same things but with wording differences is somehow fair use? Like a book called Harry Potter and the Something Rock with the exact same plot points but worded slightly different is fair use?

Do you even understand what copyright is?

permalink

report

parent

reply

[ - ]

commie@lemmy.dbzer0.comOP

5 points

9 months ago

are you saying that selling a book that has the same characters as a recently released book doing the same things but with wording differences is somehow fair use? Like a book called Harry Potter and the Something Rock with the exact same plot points but worded slightly different is fair use?

no. I was saying selling an AI model or access to it that is capable of producing that work is not, itself, copyright infringement.

in fact, do you know what a clean room is? if I provided to a writing team every English language work except those written by JK Rowling and it produced a work exactly like you’re describing, The resultant work would not be infringing copyright. it should not be any different for AI where you cannot prove what materials it was provided.

permalink

report

parent

reply

Show more comments

[ - ]

MoogleMaestro@kbin.social

7 points

9 months ago

*

It isn’t fair use, See most of faq @ fairuse faq.

“Fair Use” is often the subject of discussion when talking about online copyright with regards to online video content or music sampling, but it’s notably a flawed defense as it generally has no legal definition for how much of certain content can be used or referenced. The very first line of that faq has the following note:

How do I get permission to use somebody else’s work?
You can ask for it. If you know who the copyright owner is, you may contact the owner directly. If you are not certain about the ownership or have other related questions, you may wish to request that the Copyright Office conduct a search of its records or you may search yourself. See the next question for more details.

All artists / writers and others are asking LLM model producers to do is a) Ask for permission or B) Attribute the artists work in some kind of ledger, respecting the copyright of their work. Every work you make (write/play/draw/whatever) has a copyright that should be respected by companies and are not waived by EULA or TOS (ever) and must be respected in order for author attribution as a concept to work at all. There is plenty of free, permissive copyrighted content on the internet that can be used instead to train an LLM, but simply asking for permission or giving attribution would at least be a step in the right direction for these companies and for the industry as a whole.

Defenders of AI will note that the “use” of art in LLM is limited and thus protected by fair use, but that is debatable based on the content of the above listed FAQ.

How much of someone else’s work can I use without getting permission?
Under the fair use doctrine of the U.S. copyright statute, it is permissible to use limited portions of a work including quotes, for purposes such as commentary, criticism, news reporting, and scholarly reports. There are no legal rules permitting the use of a specific number of words, a certain number of musical notes, or percentage of a work. Whether a particular use qualifies as fair use depends on all the circumstances. See, Fair Use Index, and Circular 21, Reproductions of Copyrighted Works by Educators and Librarians.

You can see that the use cases above (commentary, criticism, news reporting and scholarly reports) does not qualify LLM companies to use or train their models with copyrighted data for privatized industry. Additionally, you’ll note that “market disruptive” uses cannot be protected by fair use in it’s definition, meaning that displacing artists with AI automatically makes LLM use of copyrighted material an infraction of copyright that is not protected by the fair use clause.

Regardless, this will need to be proved in court and even if it passes certain criteria, it will not apply to all infractions. Fair use is a defense, not a protection, and thus LLM producers will have to spend time in court in order to defend individual infractions. There’s no way for them to catch all copyright infringement with one ruling, it needs to be proved on a case-by-case basis.

IANAL but this is my 2 cents on the matter.

permalink

report

reply

[ - ]

commie@lemmy.dbzer0.comOP

3 points

9 months ago

this will need to be proved in court

this is true of all fair use. this is almost the definition of fair use. Fair use can only exist after a judge has adjudicated it. before it is questionable.

permalink

report

parent

reply

[ - ]

Infiltrated_ad8271@kbin.social

0 points

9 months ago

*

You can see that the use cases above (commentary, criticism, news reporting and scholarly reports) does not qualify LLM companies to use or train their models

Seems quite obvious that the text you quoted refers exclusively to plagiarism. This does not include things like being inspired by it, referencing it, parodying it and of course not training AI either, because what matters is whether the result is protected content.

You can argue that memorizing and sharing training data is a copyright violation, and that’s a fair point, but it’s also worth noting that this is very much a minority, accidental and is being addressed.

permalink

report

parent

reply

Training Generative AI Models on Copyrighted Works Is Fair Use - Change My Mind(mastodon.lawprofs.org)

Technology

!technology@lemmy.world

Our Rules

Approved Bots

Community stats

Community moderators