lemm.ee

11 points

1 year ago

Isn’t learning the basic act of reading text? I’m not sure what the AI companies are doing is completely right but also, if your position is that only humans can learn and adapt text, that broadly rules out any AI ever.

report

[ - ]

BrooklynMan@lemmy.ml

14 points

1 year ago

Isn’t learning the basic act of reading text?

not even close. that’s not how AI training models work, either.

if your position is that only humans can learn and adapt text

nope-- their demands are right at the top of the article and in the summary for this post:

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools

that broadly rules out any AI ever

only if the companies training AI refuse to pay

report

[ - ]

4 points

1 year ago

Isn’t learning the basic act of reading text?

not even close. that’s not how AI training models work, either.

Of course it is. It’s not a 1:1 comparison, but the way generative AI works and the we incorporate styles and patterns are more similar than not. Besides, if a tensorflow script more closely emulated a human’s learning process, would that matter for you? I doubt that very much.

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of >> their copyrighted works in training artificial intelligence tools

Having to individually license each unit of work for a LLM would be as ridiculous as trying to run a university where you have to individually license each student reading each textbook. It would never work.

What we’re broadly talking about is generative work. That is, by absorbing one a body of work, the model incorporates it into an overall corpus of learned patterns. That’s not materially different from how anyone learns to write. Even my use of the word “materially” in the last sentence is, surely, based on seeing it used in similar patterns of text.

The difference is that a human’s ability to absorb information is finite and bounded by the constraints of our experience. If I read 100 science fiction books, I can probably write a new science fiction book in a similar style. The difference is that I can only do that a handful of times in a lifetime. A LLM can do it almost infinitely and then have that ability reused by any number of other consumers.

There’s a case here that the renumeration process we have for original work doesn’t fit well into the AI training models, and maybe Congress should remedy that, but on its face I don’t think it’s feasible to just shut it all down. Something of a compulsory license model, with the understanding that AI training is automatically fair use, seems more reasonable.

report

[ - ]

BrooklynMan@lemmy.ml

7 points

1 year ago

Of course it is. It’s not a 1:1 comparison

no, it really isn’t–it’s not a 1000:1 comparison. AI generative models are advanced relational algorithms and databases. they don’t work at all the way the human mind does.

but the way generative AI works and the we incorporate styles and patterns are more similar than not. Besides, if a tensorflow script more closely emulated a human’s learning process, would that matter for you? I doubt that very much.

no, the results are just designed to be familiar because they’re designed by humans, for humans to be that way, and none of this has anything to do with this discussion.

Having to individually license each unit of work for a LLM would be as ridiculous as trying to run a university where you have to individually license each student reading each textbook. It would never work.

nobody is saying it should be individually-licensed. these companies can get bulk license access to entire libraries from publishers.

That’s not materially different from how anyone learns to write.

yes it is. you’re just framing it in those terms because you don’t understand the cognitive processes behind human learning. but if you want to make a meta comparison between the cognitive processes behind human learning and the training processes behind AI generative models, please start by citing your sources.

The difference is that a human’s ability to absorb information is finite and bounded by the constraints of our experience. If I read 100 science fiction books, I can probably write a new science fiction book in a similar style. The difference is that I can only do that a handful of times in a lifetime. A LLM can do it almost infinitely and then have that ability reused by any number of other consumers.

this is not the difference between humans and AI learning, this is the difference between human and computer lifespans.

There’s a case here that the renumeration process we have for original work doesn’t fit well into the AI training models

no, it’s a case of your lack of imagination and understanding of the subject matter

and maybe Congress should remedy that

yes

but on its face I don’t think it’s feasible to just shut it all down.

nobody is suggesting that

Something of a compulsory license model, with the understanding that AI training is automatically fair use, seems more reasonable.

lmao

report

[ - ]

4 points

1 year ago

You’re getting lost in the weeds here and completely misunderstanding both copyright law and the technology used here.

First of all, copyright law does not care about the algorithms used and how well they map what a human mind does. That’s irrelevant. There’s nothing in particular about copyright that applies only to humans but not to machines. Either a work is transformative or it isn’t. Either it’s derivative of it isn’t.

What AI is doing is incorporating individual works into a much, much larger corpus of writing style and idioms. If a LLM sees an idiom used a handful of times, it might start using it where the context fits. If a human sees an idiom used a handful of times, they might do the same. That’s true regardless of algorithm and there’s certainly nothing in copyright or common sense that separates one from another. If I read enough Hunter S Thompson, I might start writing like him. If you feed an LLM enough of the same, it might too.

Where copyright comes into play is in whether the new work produced is derivative or transformative. If an entity writes and publishes a sequel to The Road, Cormac McCarthy’s estate is owed some money. If an entity writes and publishes something vaguely (or even directly) inspired by McCarthy’s writing, no money is owed. How that work came to be (algorithms or human flesh) is completely immaterial.

So it’s really, really hard to make the case that there’s any direct copyright infringement here. Absorbing material and incorporating it into future works is what the act of reading is.

The problem is that as a consumer, if I buy a book for $12, I’m fairly limited in how much use I can get out of it. I can only buy and read so many books in my lifetime, and I can only produce so much content. The same is not true for an LLM, so there is a case that Congress should charge them differently for using copyrighted works, but the idea that OpenAI should have to go to each author and negotiate each book would really just shut the whole project down. (And no, it wouldn’t be directly negotiated with publishers, as authors often retain the rights to deny or approve licensure).

report

[ - ]

BrooklynMan@lemmy.ml

4 points

1 year ago

You’re getting lost in the weeds here and completely misunderstanding both copyright law and the technology used here.

you’re accusing me of what you are clearly doing after I’ve explained twice how you’re doing that. I’m not going to waste my time doing it again. except:

Where copyright comes into play is in whether the new work produced is derivative or transformative.

except that the contention isn’t necessarily over what work is being produced (although whether it’s derivative work is still a matter for a court to decide anyway), it’s regarding that the source material is used for training without compensation.

The problem is that as a consumer, if I buy a book for $12, I’m fairly limited in how much use I can get out of it.

and, likewise, so are these companies who have been using copyrighted material - without compensating the content creators - to train their AIs.

report

[ - ]

-1 points

1 year ago

these companies who have been using copyrighted material - without compensating the content creators - to train their AIs.

That wouldn’t be copyright infringement.

It isn’t infringement to use a copyrighted work for whatever purpose you please. What’s infringement is reproducing it.

report

MirthfulAlembic@lemmy.world

Show more comments

[ - ]

1 point

1 year ago

A key point is that intellectual property law was written to balance the limitations of human memory and intelligence, public interest, and economic incentives. It’s certainly never been in perfect balance. But the possibility of a machine being able to consume enormous amounts of information in a very short period of time has never been a variable for legislators. It throws the balance off completely in another direction.

There’s no good way to resolve this without amending both our common understanding of how intellectual property should work and serve both producers and consumers fairly, as well as our legal framework. The current laws are simply not fit for purpose in this domain.

report

[ - ]

r1veRRR@feddit.de

0 points

1 year ago

Nothing about todays iteration of copyright is reasonable or good for us. And in any other context, this (relatively) leftist forum would clamour to hate on copyright. But since it could now hurt a big corporation, suddenly copyright is totally cool and awesome.

(for reference, the true problem here is, as always, capitalism)

report

[ - ]

2 points

1 year ago

I very much agree.

report