lemm.ee

Local All Communities Log in Sign up

Local All Communities

62

AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To)(www.tomshardware.com)

posted 1 year ago

by

RickRussell_CA@beehaw.org

in

technology@beehaw.org

Avram Piltch is the editor in chief of Tom’s Hardware, and he’s written a thoroughly researched article breaking down the promises and failures of LLM AIs.

Sort:

Hot Top Controversial New Old

[ +- ]

36 points

1 year ago

*

They have the right to ingest data, not because they’re “just learning like a human would". But because I - a human - have a right to grab all data that’s available on the public internet, and process it however I want, including by training statistical models. The only thing I don’t have a right to do is distribute it (or works that resemble it too closely).

In you actually show me people who are extracting books from LLMs and reading them that way, then I’d agree that would be piracy - but that’d be such a terrible experience if it ever works - that I can’t see it actually happening.

report

reply

[ +- ]

RickRussell_CA@beehaw.orgOP

29 points

1 year ago

*

Two things:

Many of these LLMs – perhaps all of them – have been trained on datasets that include books that were absolutely NOT released into the public domain.
Ethically, we would ask any author who parrots the work of others to provide citations to original references. That rarely happens with AI language models, and if they do provide citations, they often do it wrong.

report

reply

[ +- ]

27 points

1 year ago

I’m sick and tired of this “parrots the works of others” narrative. Here’s a challenge for you: go to https://huggingface.co/chat/, input some prompt (for example, “Write a three paragraphs scene about Jason and Carol playing hide and seek with some other kids. Jason gets injured, and Carol has to help him.”). And when you get the response, try to find the author that it “parroted”. You won’t be able to - because it wouldn’t just reproduce someone else’s already made scene. It’ll mesh maaany things from all over the training data in such a way that none of them will be even remotely recognizable.

report

reply

[ +- ]

RickRussell_CA@beehaw.orgOP

16 points

1 year ago

And yet, we know that the work is mechanically derivative.

report

reply

Show more comments

Show more comments

[ +- ]

state_electrician@discuss.tchncs.de

1 point

1 year ago

Well, I think that these models learn in a way similar to humans as in it’s basically impossible to tell where parts of the model came from. And as such the copyright claims are ridiculous. We need less copyright, not more. But, on the other hand, LLMs are not humans, they are tools created by and owned by corporations and I hate to see them profiting off of other people’s work without proper compensation.

I am fine with public domain models being trained on anything and being used for noncommercial purposes without being taken down by copyright claims.

report

reply

Show more comments

Show more comments

[ +- ]

RandoCalrandian@kbin.social

4 points

1 year ago

Is there a meaningful difference between reproducing the work and giving a summary? Because I’ll absolutely be using AI to filter all the editorial garbage out of news, setup and trained myself to surface what is meaningful to me stripped of all advertising, sponsorships, and detectable bias

report

reply

[ +- ]

RickRussell_CA@beehaw.orgOP

10 points

1 year ago

When you figure out how to train an AI without bias, let us know.

report

reply

Show more comments

Show more comments

[ +- ]

Tarte@kbin.social

5 points

1 year ago

*

I have yet to find an LLM that can summarize a text without errors. I already mentioned this in another post a few days back, but Google‘s new search preview is driving me mad with all the hidden factual errors. They make me click only to realize that the LLM told me what I wanted to find, not what is there (wrong names, wrong dates, etc.).

I greatly prefer the old excerpt summaries over the new imaginary ones (they‘re currently A/B testing).

report

reply

[ +- ]

donuts@kbin.social

20 points

1 year ago

*

You’re making two, big incorrect assumptions:

Simply seeing something on the internet does not give you any legal or moral rights to use that thing in any way other than things which are, or have previously been, deemed to be “fair use” by a court of law. Individuals have personal rights over their likeness and persona, and copyright holders have rights over their works, whether they are on the internet or not. In other words, there is a big difference between “visible in public” and “public domain”.
More importantly, something that might be considered “fair use” for a human being do to is not necessary “fair use” when a computer or “AI” does it. Judgements of what is and is not fair use are made on a case by case basis as a legal defense against copyright infringement claims, and multiple factors (purpose of use, nature of original work, degree and sustainability of use, market effect, etc.) are often taken into consideration. At the very least, AI use has serious implications on sustainability and markets, especially compared to examples of human use.

I know these are really tough pills for AI fans to swallow, but you know what they say… “If it seems too good to be true, it probably is.”

report

reply

[ +- ]

10 points

1 year ago

*

One the contrary - the reason copyright is called that is because it started as the right to make copies. Since then it’s been expanded to include more than just copies, such as distributing derivative works

But the act of distribution is key. If I wanted to, I could write whatever derivative works in my personal diary.

I also have the right to count the number of occurrences of the letter ‘Q’ in Harry Potter workout Rowling’s permission. This I can also post my count online for other lovers of ‘Q’, because it’s not derivative (it is ‘derived’, but ‘derivative’ is different - according to Wikipedia it means ‘includes major copyrightable elements’).

Or do more complex statistical analysis.

report

reply

[ +- ]

raccoona_nongrata@beehaw.org

14 points

1 year ago

*

Deleted by creator

report

reply

[ +- ]

FlapKap@feddit.dk

24 points

1 year ago

I like the point about LLMs interpolating data while humans extrapolate. I think that’s sums up a key difference in “learning”. It’s also an interesting point that we anthropomorphise ML models by using words such as learning or training, but I wonder if there are other better words to use. Fitting?

report

reply

[ +- ]

RickRussell_CA@beehaw.orgOP

14 points

1 year ago

“Plagiarizing” 😜

report

reply

[ +- ]

Amju Wolf@pawb.social

10 points

1 year ago

Isn’t interpolation and extrapolation the same thing effectively, given a complex enough system?

report

reply

[ +- ]

maynarkh@feddit.nl

2 points

1 year ago

No, repeated extrapolation results in eventually making everything that ever could be made, constant interpolation would result in creating the same “average” work over and over.

The difference is infinite vs zero variety.

report

reply

[ +- ]

CanadaPlus@lemmy.sdf.org

1 point

1 year ago

*

Fun fact, an open interval is topologically isomorphic the the entire number line. In practice they’re often different but you started talking about limits (“eventually”), where that will definitely come up.

report

reply

[ +- ]

CanadaPlus@lemmy.sdf.org

2 points

1 year ago

Depending on the geometry of the state space, very literally yes. Think about a sphere, there’s a straight line passing from Denver to Guadalajara, roughly hitting Delhi on the way. Is Delhi in between them (interpolation), or behind one from the other (extrapolation)? Kind of both, unless you move the goalposts to add distance limits on interpolation, which could themselves be broken by another geometry

report

reply

[ +- ]

brie@beehaw.org

8 points

1 year ago

What about tuning, to align with “finetuning?”

report

reply

[ +- ]

frog 🐸@beehaw.org

6 points

1 year ago

I also like the point about interpolation vs extrapolation. It’s demonstrated when you look at art history (or the history of any other creative field). Humans don’t look at paintings and create something that’s predictable based on those paintings. They go “what happens when I take that idea and go even further?” An LLM could never have invented Cubism after looking at Paul Cezanne’s paintings, but Pablo Picasso did.

report

reply

[ +- ]

lloram239@feddit.de

2 points

1 year ago

That’s not a limitation of ML, but just how it is commonly used. You can take every parameter that neural network recognizes and tweak it, make it bigger, smaller, recombine it with other stuff and marvel at the results. That’s how we got origami porn, (de)cartoonify AI, QR code art, Balenciaga, dancing statues or my 5min attempt at reinventing cubism (tell AI to draw cubes over a depthmap).

report

reply

[ +- ]

nyan@lemmy.cafe

15 points

1 year ago

Let’s be clear on where the responsibility belongs, here. LLMs are neither alive nor sapient. They themselves have no more “rights” than a toaster. The question is whether the humans training the AIs have the right to feed them such-and-such data.

The real problem is the way these systems are being anthropomorphized. Keep your attention firmly on the man behind the curtain.

report

reply

[ +- ]

Storksforlegs@beehaw.org

7 points

1 year ago

Yes, these are the same people who are charging a fee to use their AI and profiting. Placing the blame and discussion on the AI itself conveniently overlooks a lot here.

report

reply

[ +- ]

CanadaPlus@lemmy.sdf.org

1 point

1 year ago

*

You know, I think ChatGPT is way ahead of a toaster. Maybe it’s more like a small animal of some kind.

report

reply

[ +- ]

nyan@lemmy.cafe

2 points

1 year ago

One could equally claim that the toaster was ahead, because it does something useful in the physical world. Hmm. Is a robot dog more alive than a Tamagotchi?

report

reply

[ +- ]

abhibeckert@beehaw.org

1 point

1 year ago

*

There are a lot of subjects where ChatGPT knows more than I do.

Does it know more than someone who has studied that subject their whole life? Of course not. But those people aren’t available to talk to me on a whim. ChatGPT is available, and it’s really useful. Far more useful than a toaster.

As long as you only use it for things where a mistake won’t be a problem - it’s a great tool. And you can also use it for “risky” decisions but take the information it gave you to an expert for verification before acting.

report

reply

Show more comments

Show more comments

[ +- ]

DarkenLM@artemis.camp

12 points

1 year ago

Machines don’t Lear like humans yet.

Our brains are a giant electrical/chemical system that somehow creates consciousness. We might be able to create that in a computer. And the day it happens, then what will be the difference between a human and a true AI?

report

reply

[ +- ]

CanadaPlus@lemmy.sdf.org

3 points

1 year ago

If you read the article, there’s “experts” saying that human comprehension is fundamentally computationally intractable, which is basically a religious standpoint. Like, ChatGPT isn’t intellegent yet, partly because it doesn’t really have long term memory, but yeah, there’s overwhelming evidence the brain is a machine like any other.

report

reply

[ +- ]

barsoap@lemm.ee

2 points

1 year ago

fundamentally computationally intractable

…using current AI architecture, and the insight isn’t new it’s maths. This is currently the best idea we have about the subject. Trigger warning: Cybernetics, and lots of it.

Meanwhile yes of course brains are machines like any other claiming otherwise is claiming you can compute incomputable functions which a physical and logical impossibility. And it’s fucking annoying to talk about this topic with people who don’t understand computability. Usually turns into a shouting match of “you’re claiming the existence of something like a soul, some metaphysical origin of the human mind” vs. “no I’m not” vs. “yes you are but you don’t understand why”.

report

reply

[ +- ]

CanadaPlus@lemmy.sdf.org

1 point

1 year ago

…using current AI architecture, and the insight isn’t new it’s maths.

That is not what van Rooij et al. said, which is who was cited in here. They published their essay here, which I haven’t really read, but which appears to make an argument about any possible computer. They’re psychologists and I don’t see any LaTeX in there, so they must be missing something.

Unfortunately I can’t open your link, although it sounds interesting. A feedforward network can approximate any computable function if it gets to be arbitrarily large, but depending on how you want to feed an agent inputs from it’s environment and read it’s actions a single function might not be enough.

report

reply

Show more comments

Show more comments

[ +- ]

CanadaPlus@lemmy.sdf.org

12 points

1 year ago

There’s a lot of opinion in here written in as if it’s fact.

report

reply

[ +- ]

toothpicks@beehaw.org

1 point

1 year ago

Here I was thinking I could trust Mr Tom

report

reply

Technology

!technology@beehaw.org

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

Free and Open Source Software
Programming
Operating Systems

This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

Community stats

2.8K
Monthly active users
3.5K
Posts
82K
Comments

Community moderators

alyaza [they/she]@beehaw.org
TheRtRevKaiser@beehaw.org
gyrfalcon@beehaw.org
rs5th@beehaw.org
coldredlight@beehaw.org
Los@beehaw.org
Leigh@beehaw.org
TheRtRevKaiser@kbin.social
Chris Remington@beehaw.org

modlog legal instances join-lemmy.org

lemmy-ui-next v0.11.0 (github)lemmy v0.19.5 (github)