lemm.ee

Local All Communities Log in Sign up

Local All Communities

299

Reddit has a new AI training deal to sell user content(www.theverge.com)

posted 9 months ago

by

L4sBot@lemmy.worldMB

in

technology@lemmy.world

Reddit has a new AI training deal to sell user content::Reddit has reportedly made a deal with an unnamed AI company to allow access to its platform’s content for the purposes of AI model training.

Sort:

Hot Top Controversial New Old

[ +- ]

AutoTL;DR@lemmings.worldB

1 point

9 months ago

This is the best summary I could come up with:

Reddit will let “an unnamed large AI company” have access to its user-generated content platform in a new licensing deal, according to Bloomberg yesterday.

The deal, “worth about $60 million on an annualized basis,” the outlet writes, could still change as the company’s plans to go public are still in the works.

The news also follows an October story that Reddit had threatened to cut off Google and Bing’s search crawlers if it couldn’t make a training data deal with AI companies.

Last year, it successfully stonewalled its way out of the biggest protest in its history after changes to its third-party API access pricing caused developers of the most popular Reddit apps to shut down.

As Bloomberg writes, Reddit’s year-over-year revenue was up by 20 percent by the end of 2023, but it was still $200 million shy of a $1 billion target it had set two years prior.

The company was reportedly advised to seek a $5 billion valuation when it opens up for public investment, which is expected to happen in March.

The original article contains 346 words, the summary contains 175 words. Saved 49%. I’m a bot and I’m open source!

report

reply

[ +- ]

Lmaydev@programming.dev

26 points

9 months ago

I’d be very surprised if people weren’t already scraping Reddit for this.

report

reply

[ +- ]

NeatNit@discuss.tchncs.de

6 points

9 months ago

*

it’s all but guaranteed. Reminds me of this Computerphile video: https://youtu.be/WO2X3oZEJOA?t=874 TL;DW: there were “glitch tokens” in GPT (and therefore ChatGPT) which undeniably came from Reddit usernames.

Note, there’s no proof that these reddit usernames were in the training data (and there’s even reasons to assume that they weren’t, watch the video for context) but there’s no doubt that OpenAI already had scraped reddit data at some point prior to training, probably mixed in with all the rest of their text data. I see no reason to assume they completely removed all reddit text before training. The video suggest reasons and evidence that they removed certain subreddits, not all of reddit.

report

reply

[ +- ]

PipedLinkBot@feddit.rocksB

1 point

9 months ago

Here is an alternative Piped link(s):

https://piped.video/WO2X3oZEJOA?t=874

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

report

reply

[ +- ]

NoRodent@lemmy.world

20 points

9 months ago

*

I mean, there’s /r/SubSimulatorGPT2 that’s been running for years… Although that one was at least hilarious to read because at that stage the AI was in the sweet spot of being simultaneously coherent while making total lapses in logic.

report

reply

[ +- ]

TexasDrunk@lemmy.world

6 points

9 months ago

Didn’t forget incredibly racist on multiple occasions.

report

reply

[ +- ]

bbkpr@lemmy.world

2 points

9 months ago

The AI is what was fed into it 😂

report

reply

[ +- ]

Verserk@lemmy.dbzer0.com

8 points

9 months ago

That was the real reason for the API changes last year, apps just got caught in the crossfire.

report

reply

[ +- ]

fuckwit_mcbumcrumble@lemmy.world

3 points

9 months ago

Yeah I thought that was pretty well the established conscientious on the thing. People questioning it confuses me honestly.

report

reply

[ +- ]

comrade19@lemmy.world

18 points

9 months ago

Why is there nothing on reddit about this lol

report

reply

[ +- ]

bobs_monkey@lemm.ee

18 points

9 months ago

Mustn’t spook the ~~product~~ users

report

reply

[ +- ]

FartsWithAnAccent@lemmy.world

3 points

9 months ago

I’d be surprised if there wasn’t, I don’t think Spez and his cohorts are competent enough to completely suppress all information about it site wide.

report

reply

[ +- ]

ME5SENGER_24@lemmy.world

17 points

9 months ago

FUCK REDDIT! FUCK U/SPEZ! The Red-exit shall endure, VIVA LA LEMMY!!

report

reply

[ +- ]

Boozilla@lemmy.world

14 points

9 months ago

I bet the fuckers will use “deleted” data, too

report

reply

[ +- ]

General_Effort@lemmy.world

4 points

9 months ago

Deleted? You mean made unscrapeable. It’s exclusive to Reddit licensees.

report

reply

[ +- ]

RedFox@infosec.pub

3 points

9 months ago

*

Yeah, the second anyone posts anything to any service, all their house are belong to the evil corp…

I just blended two references…

report

reply

[ +- ]

tinwhiskers@lemmy.world

3 points

9 months ago

what about edited?

report

reply

[ +- ]

General_Effort@lemmy.world

2 points

9 months ago

They say it’s $60 million on an annualized basis. I wonder who’d pay that, given that you can probably scrape it for free.

Maybe it’s the AI act in the EU. That might cause trouble in that regard. The US is seeing a lot of rent-seeker PR, too, of course. That might cause some to hedge their bets.

Maybe some people had not realized that yet, but limiting fair use does not just benefit the traditional media corporations but also the likes of Reddit, Facebook, Apple, etc. Making “robots.txt” legally binding would only benefit the tech companies.

report

reply

Technology

!technology@lemmy.world

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

@L4s@lemmy.world
@autotldr@lemmings.world
@PipedLinkBot@feddit.rocks
@wikibot@lemmy.world

Community stats

17K
Monthly active users
12K
Posts
554K
Comments

Community moderators

L3s@lemmy.world
L3s@fry.gs
L4sBot@fry.gsB
L4sBot@lemmy.worldB
enu@lemmy.world

modlog legal instances join-lemmy.org

lemmy-ui-next v0.11.0 (github)lemmy v0.19.5 (github)