lemm.ee

Local All Communities Log in Sign up

Local All Communities

467

Leaked Documents Show Nvidia Scraping ‘A Human Lifetime’ of Videos Per Day to Train AI(www.404media.co)

posted 3 months ago

by

misk@sopuli.xyz

in

technology@lemmy.world

https://archive.is/2024.08.05-162750/https://www.404media.co/nvidia-ai-scraping-foundational-model-cosmos-project/

Sort:

Hot Top Controversial New Old

[ +- ]

R00bot@lemmy.blahaj.zone

42 points

3 months ago

*

I feel like the amount of training data required for these AIs serves as a pretty compelling argument as to why AI is clearly nowhere near human intelligence. It shouldn’t take thousands of human lifetimes of data to train an AI if it’s truly near human-level intelligence. In fact, I think it’s an argument for them not being intelligent whatsoever. With that much training data, everything that could be asked of them should be in the training data. And yet they still fail at any task not in their data.

Put simply; a human needs less than 1 lifetime of training data to be more intelligent than AI. If it hasn’t already solved it, I don’t think throwing more training data/compute at the problem will solve this.

report

reply

[ +- ]

rdri@lemmy.world

27 points

3 months ago

There is no “intelligence”, ai is a pr word. Just a language model that feeds on a lot of data.

report

reply

[ +- ]

R00bot@lemmy.blahaj.zone

7 points

3 months ago

Oh yeah we’re 100% agreed on that. I’m thinking of the AI evangelicals who will argue tooth and nail that LLMs have “emergent properties” of intelligence, and that it’s simply an issue of training data/compute power before we’ll get some digital god being. Unfortunately these people exist, and they’re depressingly common. They’ve definitely reduced in numbers since AI hype has died down though.

report

reply

[ +- ]

Hunter232@programming.dev

12 points

3 months ago

Humans have the advantage of billions of years of evolution.

report

reply

[ +- ]

Cyteseer@lemmy.world

-1 points

3 months ago

“ai” also has the advantage of billions of years of evolution.

report

reply

[ +- ]

noobdoomguy8658@feddit.org

4 points

3 months ago

We’re very proficient at walking, but somehow haven’t produced a walking home or anything like that.

It’s not very linear.

report

reply

[ +- ]

wizardbeard@lemmy.dbzer0.com

3 points

3 months ago

Definitely not the same thing. Just because you can make use of the end result of major efforts does not somehow magically give you access to all the knowledge from those major efforts.

You can use a smart phone easily, but that doesn’t mean you magically know how to make one.

report

reply

[ +- ]

stupidcasey@lemmy.world

5 points

3 months ago

You’ve had the entire history of evolution to get the instinct you have today.

Nature Vs Nurture is a huge ongoing debate.

Just because it takes longer to train doesn’t mean it’s not intelligent, kids develop slower than chimps.

Also intelligent doesn’t really mean anything, I personally think Intelligence is the ability to distillate unusable amounts of raw data and intuit a result beneficial to one’s self. But very few people agree with me.

report

reply

[ +- ]

Peanut@sopuli.xyz

0 points

3 months ago

I see intelligence as filling areas of concept space within an econiche in a way that proves functional for actions within that space. I think we are discovering more that “nature” has little commitment, and is just optimizing preparedness for expected levels of entropy within the functional eco-niche.

Most people haven’t even started paying attention to distributed systems building shared enactive models, but they are already capable of things that should be considered groundbreaking considering the time and finances of development.

That being said, localized narrow generative models are just building large individual models of predictive process that doesn’t by default actively update information.

People who attack AI for just being prediction machines really need to look into predictive processing, or learn how much we organics just guess and confabulate ontop of vestigial social priors.

But no, corpos are using it so computer bad human good, even though the main issue here is the humans that have unlimited power and are encouraged into bad actions due to flawed social posturing systems and the confabulating of wealth with competency.

report

reply

[ +- ]

Todd Bonzalez@lemm.ee

4 points

3 months ago

A human lifetime worth of video is not anywhere close to equalling a human lifetime of actual corporeal existence, even in the perfect scenario where the AI is as capable as a human brain.

report

reply

[ +- ]

R00bot@lemmy.blahaj.zone

3 points

3 months ago

Strange to equate the other senses to performance in intellectual tasks but sure. Do you think feeding data from smells, touch, taste, etc. into an AI along with the video will suddenly make it intelligent? No, it will just make it more likely to guess what something smells like. I think it’s very clear that our current approach to AI is missing something much more fundamental to thought than that, it’s not just a dataset problem.

report

reply

[ +- ]

Rhaedas@fedia.io

35 points

3 months ago

Humans don’t live that long. That’s only about 1.5 million 30 min videos, which isn’t a huge amount for a whole day’s worth of scraping.

report

reply

[ +- ]

Irremarkable@fedia.io

13 points

3 months ago

Yeah this is honestly an order of magnitude less that I would’ve thought

report

reply

[ +- ]

Infynis@midwest.social

5 points

3 months ago

Maybe they’re running out

report

reply

[ +- ]

mrfriki@lemmy.world

2 points

3 months ago

I would be lucky if I get to watch more than 10000 videos in my entire lifetime.

report

reply

[ +- ]

4 points

3 months ago

Bro you’re doing it with your eyes, right now!

report

reply

[ +- ]

twei@discuss.tchncs.de

2 points

3 months ago

That’s only about 1.5 million 30 min videos

aka 2 videos from Quinton Reviews

report

reply

[ +- ]

MonkderVierte@lemmy.ml

34 points

3 months ago

Properly following licensing, right?

report

reply

[ +- ]

lemmyvore@feddit.nl

26 points

3 months ago

No, see, because it’s “learning like a human”, and everybody knows that you’re allowed to bypass any licensing for learning. /s

But seriously I don’t know how they make the jump to these conclusions either.

report

reply

[ +- ]

areyouevenreal@lemm.ee

3 points

3 months ago

*

This is a massive strawman argument. No one is saying you shouldn’t have a license to view the content in order to train an AI on it. Most of the information used to train these models is publicly available and licensed for public viewing.

report

reply

[ +- ]

lemmyvore@feddit.nl

17 points

3 months ago

Just because something is available for public viewing does not mean it’s licensed for anything except personal use.

The strawman here is that since physical people benefit from personal use exceptions in the law, machine learning software should too. But why should they? Since when is a piece of software ran by a corporation equivalent to an individual person?

report

reply

Show more comments

Show more comments

[ +- ]

31337@sh.itjust.works

2 points

3 months ago

Information wants to be free.

report

reply

[ +- ]

MonkderVierte@lemmy.ml

1 point

3 months ago

I mean, i agree, but artists want to eat too.

report

reply

[ +- ]

Kekzkrieger@feddit.org

27 points

3 months ago

instead of focusing on their products and improving them for everyone, some shitty ceo is pushing their shitty ai agenda down everyones throat.

report

reply

[ +- ]

Drewelite@lemmynsfw.com

12 points

3 months ago

Well it sounds like they’re doing something to make their products better, you just disagree that it’s going to be successful.

report

reply

[ +- ]

Zetta@mander.xyz

-10 points

3 months ago

Nvidia’s biggest product is absolutely AI by a massive landslide, I’m pretty sure I read that the point of them downloading these videos and doing the training is to build a pipeline for their AI users to do the same with their own shit. (Can’t be bothered to double-check cuz I really don’t care)

So they aren’t downloading all this video to make a crazy AI model. They’re downloading all this video to make a tool for their AI customers to use, you may not agree but improving their product is exactly what they’re doing.

report

reply

[ +- ]

Agrivar@lemmy.world

8 points

3 months ago

Can’t be bothered to double-check cuz I really don’t care

For FUCK SAKE, why do you even bother posting your garbage opinions then? and with such authority too!

report

reply

[ +- ]

Zetta@mander.xyz

0 points

3 months ago

*

¯\_(ツ)_/¯ great question

report

reply

[ +- ]

SomeGuy69@lemmy.world

21 points

3 months ago

*

So they use VMs to simulate user accounts, in future this will be blocked and whatever new AI startup is there won’t have the option to do so. Competition blocked. Forever.

report

reply

Technology

!technology@lemmy.world

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

@L4s@lemmy.world
@autotldr@lemmings.world
@PipedLinkBot@feddit.rocks
@wikibot@lemmy.world

Community stats

18K
Monthly active users
12K
Posts
553K
Comments

Community moderators

L3s@lemmy.world
L3s@fry.gs
L4sBot@fry.gsB
L4sBot@lemmy.worldB
enu@lemmy.world

modlog legal instances join-lemmy.org

lemmy-ui-next v0.11.0 (github)lemmy v0.19.5 (github)