AI models fed AI-generated data quickly spew nonsense(www.nature.com)

Don’t worry, the big tech companies took a snapshot of the internet before it was poisoned so they can easily profit from LLMs without allowing competitors into the market. That’s who “We” is right?

permalink

report

[ - ]

WhatAmLemmy@lemmy.world

19 points

4 months ago

It’s impossible for any of them to have taken a sufficient snapshot. A snapshot of all unique data on the clearnet would have probably been in the scale of hundreds to thousands of exabytes, which is (apparently) more storage than any single cloud provider.

That’s forgetting the prohibitively expensive cost to process all that data for any single model.

The reality is that, like what we’ve done to the natural world, they’re polluting and corrupting the internet without taking a sufficient snapshot — just like the natural world, everything that’s lost is lost FOREVER… all in the name of short term profit!

permalink

report

parent

[ - ]

VeganPizza69 Ⓥ@lemmy.world

2 points

4 months ago

The retroactive enclosure of the digital commons.

permalink

report

parent

[ - ]

VeganPizza69 Ⓥ@lemmy.world

28 points

4 months ago

GOOD.

This “informational incest” is present in many aspects of society and needs to be stopped (one of the worst places is in the Intelligence sector).

permalink

report

[ - ]

Etterra@lemmy.world

14 points

4 months ago

Informational Incest is my least favorite IT company.

permalink

report

parent

[ - ]

RIPandTERROR@sh.itjust.works

9 points

4 months ago

WHAT ARE YOU DOING STEP SYS ADMIN?

permalink

report

parent

[ - ]

credit crazy@lemmy.world

2 points

4 months ago

Too bad they only operate in Alabama

permalink

report

parent

[ - ]

runeko@programming.dev

1 point

4 months ago

Damn. I just bought 200 shares of ININ.

permalink

report

parent

[ - ]

funkless_eck@sh.itjust.works

2 points

4 months ago

they’ll be acquired by McKinsey soon enough

permalink

report

parent

[ - ]

Hamartiogonic@sopuli.xyz

24 points

4 months ago

A few years ago, people assumed that these AIs will continue to get better every year. Seems that we are already hitting some limits, and improving the models keeps getting harder and harder. It’s like the linewidth limits we have with CPU design.

permalink

report

[ - ]

ArcticDagger@feddit.dkOP

11 points

4 months ago

I think that hypothesis still holds as it has always assumed training data of sufficient quality. This study is more saying that the places where we’ve traditionally harvested training data from are beginning to be polluted by low-quality training data

permalink

report

parent

[ - ]

HowManyNimons@lemmy.world

20 points

4 months ago

It’s almost like we need some kind of flag on AI-generated content to prevent it from ruining things.

permalink

report

parent

[ - ]

Hamartiogonic@sopuli.xyz

1 point

4 months ago

If that gets implemented, it would help AI devs and common people hanging online.

report

[ - ]

2 points

4 months ago

no, not really. the improvement gets less noticeable as it approaches the limit, but I’d say the speed at which it improves is still the same. especially smaller models and context window size. there’s now models comparable to chatgpt or maybe even gpt 4.0 (I don’t remember, one or the other) with context window size of 128k tokens, that you can run on a GPU with 16gb of vram. 128k tokens is around 90k words I think. that’s more than 4 bee movie scripts. it can “comprehend” all of that at once.

permalink

report

parent

[ - ]

KeenFlame@feddit.nu

2 points

4 months ago

No they are increasingly getting better, mostly they fit in a bigger context of other discoveries

permalink

report

parent

[ - ]