Reddit has a new AI training deal to sell user content(www.theverge.com)

posted 7 months ago

L4sBot@lemmy.worldMB

technology@lemmy.world

31 commentshide report

Reddit has a new AI training deal to sell user content::Reddit has reportedly made a deal with an unnamed AI company to allow access to its platform’s content for the purposes of AI model training.

Sort:

Hot Top Controversial New Old

[ - ]

Lmaydev@programming.dev

26 points

7 months ago

I’d be very surprised if people weren’t already scraping Reddit for this.

permalink

report

[ - ]

NoRodent@lemmy.world

20 points

7 months ago

I mean, there’s /r/SubSimulatorGPT2 that’s been running for years… Although that one was at least hilarious to read because at that stage the AI was in the sweet spot of being simultaneously coherent while making total lapses in logic.

permalink

report

parent

[ - ]

TexasDrunk@lemmy.world

6 points

7 months ago

Didn’t forget incredibly racist on multiple occasions.

permalink

report

parent

[ - ]

bbkpr@lemmy.world

2 points

7 months ago

The AI is what was fed into it 😂

permalink

report

parent

[ - ]

Verserk@lemmy.dbzer0.com

8 points

7 months ago

That was the real reason for the API changes last year, apps just got caught in the crossfire.

permalink

report

parent

[ - ]

fuckwit_mcbumcrumble@lemmy.world

3 points

7 months ago

Yeah I thought that was pretty well the established conscientious on the thing. People questioning it confuses me honestly.

permalink

report

parent

[ - ]

NeatNit@discuss.tchncs.de

6 points

7 months ago

it’s all but guaranteed. Reminds me of this Computerphile video: https://youtu.be/WO2X3oZEJOA?t=874 TL;DW: there were “glitch tokens” in GPT (and therefore ChatGPT) which undeniably came from Reddit usernames.

Note, there’s no proof that these reddit usernames were in the training data (and there’s even reasons to assume that they weren’t, watch the video for context) but there’s no doubt that OpenAI already had scraped reddit data at some point prior to training, probably mixed in with all the rest of their text data. I see no reason to assume they completely removed all reddit text before training. The video suggest reasons and evidence that they removed certain subreddits, not all of reddit.

permalink

report

parent

[ - ]

PipedLinkBot@feddit.rocksB

1 point

7 months ago

Here is an alternative Piped link(s):

https://piped.video/WO2X3oZEJOA?t=874

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

permalink

report

parent

[ - ]

gwildors_gill_slits@lemmy.ca

18 points

7 months ago

Can’t wait for chatGPT to call me good sir and tell me I win the internet.

permalink

report

[ - ]

Buffalox@lemmy.world

12 points

7 months ago

The best answer I can find to you question is “deleted by user.”

permalink

report

parent

[ - ]

comrade19@lemmy.world

18 points

7 months ago

Why is there nothing on reddit about this lol

permalink

report

[ - ]

bobs_monkey@lemm.ee

18 points

7 months ago

Mustn’t spook the ~~product~~ users

permalink

report

parent

[ - ]

FartsWithAnAccent@lemmy.world

3 points

7 months ago

I’d be surprised if there wasn’t, I don’t think Spez and his cohorts are competent enough to completely suppress all information about it site wide.

permalink

report

parent

[ - ]

ME5SENGER_24@lemmy.world

17 points

7 months ago

FUCK REDDIT! FUCK U/SPEZ! The Red-exit shall endure, VIVA LA LEMMY!!

permalink

report

[ - ]

Boozilla@lemmy.world

14 points

7 months ago

I bet the fuckers will use “deleted” data, too

permalink

report

parent

[ - ]

General_Effort@lemmy.world

4 points

7 months ago

Deleted? You mean made unscrapeable. It’s exclusive to Reddit licensees.

permalink

report

parent

[ - ]

RedFox@infosec.pub

3 points

7 months ago

Yeah, the second anyone posts anything to any service, all their house are belong to the evil corp…

I just blended two references…

permalink

report

parent

[ - ]

tinwhiskers@lemmy.world

3 points

7 months ago

what about edited?

permalink

report

parent

[ - ]

kingthrillgore@lemmy.ml

10 points

7 months ago

When spez took away API access, he basically shit on the social contract that offered a fair exchange of free access for the content we fed into reddit. After the API change, there were new terms: there is no contract. There are no terms. If you use reddit now, you are giving away everything you are to be indexed and mangled by statistics. You exist as free labor to statisticians and machines.

You are more than a few cents of bad memes.

I’m going to make the request in the AM that Lemmy should add robots.txt rules to disallow AI crawlers, to at least indicate we’re not interested. We need legislation that tells scrapers what they can access.

permalink

report

[ - ]

General_Effort@lemmy.world

1 point

7 months ago

We need legislation that tells scrapers what they can access.

What do you hope that would achieve?

Because I can only see this as benefitting Reddit, Facebook, and the like, while screwing over smaller players.

permalink

report

parent

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

Community stats

18K
Monthly active users
11K
Posts
518K
Comments

Our Rules

Approved Bots

Community stats

Community moderators