133 points

Remember the whole “if you aren’t paying for the product, you are the product”?

It wasn’t enough to turn you into a product. Now they also want to turn you into a resource. Farming your comments and posts to feed to an AI model.

What an economy we’ve built.

permalink
report
reply
24 points

I wonder why I don’t pay for Lemmy.

permalink
report
parent
reply
58 points

The kind of frightening thing is that anyone could start an instance on the Fediverse, collect all the posts and comments coming in as all instances usually do and then use it to do the same thing, and I’m not sure there’s currently anything (legally or otherwise) stopping them.

But at least we have the option to defederate such an instance. If we can find out which ones do it…

permalink
report
parent
reply
90 points

I totally understand your perspective, but I approach this from the opposite direction.

From my perspective, there’s no “at least” here. My Lemmy posts are public. I have no control over what is done with them after I post them. I am comfortable with that.

The difference between Reddit and Lemmy is not that one protects privacy and they other doesn’t. NEITHER is a platform for private discussion.

The difference is that with Lemmy, public means PUBLIC. Reddit, Twitter, and Facebook are also “public” in the sense that there can be no expectation of privacy. But they’re “private” in the corporate sense — a single corporate entity retains control of the data. They can, at will, restrict access to that data, without the consent of the users who created it.

And that’s not just theoretical; all of those companies have literally restricted access to content that users meant to be public. People can’t read the Twitter posts that I made with the intention of them being public, because Twitter now requires an account to read posts and comments. Reddit has restricted access to posts I made with the intention of them being public and readily accessible, because they killed apps and integrations, and implemented onerous access control in an attempt to hoard my data.

They altered the terms, and I, for one, got sick of praying that they would not alter them further.

Lemmy is public. You cannot control who can read it, and you cannot control what they do with it. The difference is that with a truly public platform like Lemmy, my data can benefit the whole world, instead of just some corporation.

If you are looking for a platform for private discussion, Matrix is probably it. But even then, the concept of data privacy only makes sense if you trust all the people that ever have access to the data. If I’m in a Matrix room with hundreds of strangers, I wouldn’t consider that “private” either, regardless of the protocol’s encryption.

Bad actors will always have access to the posts I make public. On Lemmy, good actors do, too, and nobody can take that away from us. THAT’S the difference.

permalink
report
parent
reply
11 points

An instance isn’t required. It’s not like the current generation of generative AI wasn’t trained from web scrapings

permalink
report
parent
reply
7 points

There are already a few instances that ignore delete requests

permalink
report
parent
reply
7 points

The instance would likely just act as a regular instance and allow normal users on, you couldn’t even tell they were using it to scrape data at that point.

permalink
report
parent
reply
6 points

Free and open information, like Wikipedia, used to be an ideal. I have used Reddit since 2008 or earlier because it got on search engines and shared information consistently on precise topics. Twitter used to also be this way, but now mostly only puts paid subscribers on search engines.

If you are to organize information around topics, such as a Commodore 64 community, and the protocol openly allows copies to be made via federation, I encourage people to have the attitude that information be treated like Wikipedia content. It sucks now that so much information from 10 years ago has been just entirely lost now that so many deliberately purged their Reddit comments, etc. Tragedy of the commons. And it drags down the entire planet that people squirrel away discussions on topics that are generally public. It’s like now everyone wants to monetize even their discussions on Commodore 64 or automotive repair / have behind absolute control or paywalls /etc.

permalink
report
parent
reply
5 points

Legally, in EU, you probably cannot scrape an instance of someone else because of the database copyright law. But I have no idea if that applies to being part of the network. Since the other instances send you their content willingly.

Maybe someone should make a license extension to ActivityPub, where instances can communicate what can and what can’t be done with the information they publish. Then at least there would be legal clarity. If it can be enforced is another question.

permalink
report
parent
reply
5 points

People can already do that without an instance, the same way google indexes the site.

permalink
report
parent
reply
4 points

If an instance is defederated, the owners can just spin up a new instance.

I’ve always thought about what you’ve said about Lemmy when people start talking about how Lemmy is more privacy focused than Reddit.

As one of your replies have said many people in the hundreds/thousandths have a copy of your data on Lemmy - the instance owners. If you decide you’ve shared too much information then you end up asking every owner to delete that nugget of information. And realistically there is nothing to enforce it. This is one benefit of the walled garden of places like Reddit because they are legally obligated to delete the information especially in places like the EU.

permalink
report
parent
reply
16 points
*

At least for the instance this was posted on: the February 2024 Beehaw Financial Update

permalink
report
parent
reply
13 points

You don’t have to, but the owners of your instance are probably paying out of pocket to keep it online. I’m sure they’re taking donations

permalink
report
parent
reply
35 points

That’s why I’m on Lemmy. At least when they train AI on my posts here it’s not legitimized by some contract.

permalink
report
reply
29 points

That AI is going to get really racist, really fast, judging by the muck we all saw daily on Reddit.

permalink
report
reply
12 points

Although it’s going to be really good at anime porn too. So there’s that.

permalink
report
parent
reply
1 point

If that’s your thing, then hell yeah brother!

permalink
report
parent
reply
28 points

Damn just 60 mil??

permalink
report
reply
12 points

Like seriously, this must be fake. Add a zero and I’d still find it suspiciously cheap.

permalink
report
parent
reply
3 points

Yeah, the diarrhea of my shitposts over there alone is worth more, it’s what will make the future AI kinda smart & very depressed.

permalink
report
parent
reply
24 points

And that’s why I deleted all my posts and comments before deleting my account. Sure, they could probably go back and restore it if they wanted but, so far, they haven’t.

Glad I landed here on Lemmy.

permalink
report
reply
10 points

I deleted all my comments last year. Recently I got a notification for a response in one of such comments. When I clicked the notification link, my comment and the response were visible. The comment doesn’t show up in my profile.

permalink
report
parent
reply
7 points
*

Interesting. I’ve specifically searched for some fairly unique content (Python scripts, etc) I posted in my time over there, and it hasn’t shown up at all.

So you left your Reddit account intact?

Edit: Fucking. Cunts. I just searched (had been a few months) and at least some of my data is back. I reckon they’ve done it ahead of the planned AI move and IPO.

Edit 2: joke’s on them - my posts were linked to an alt account I setup on Pastebin years ago. Still had the creds, so have deleted the pastes. Fuck Reddit. 🤘

permalink
report
parent
reply
6 points
*

Reddit was aggressively rate limiting tools used to delete and edit content in a funny way when the API pricing was announced. The API wouldn’t return an error, the rate limiting was silent, and the tools would report successful deletion or edits even when the edit or deletion wasn’t made.

I had to modify an existing script to handle the 5-second rate limit and, lieu of deleting, I just rewrote each comment with a farewell.

Even then I did 3 passes (minor additional edits) in cases Reddit was saving previous edits.

My content has stayed edited.

permalink
report
parent
reply
1 point
*

Do you still have the Python script available?

I was fine with keeping my comments up before for the future searchers, but I’m not fine with that shithole making profit off of it.

permalink
report
parent
reply
2 points

I’ve had the same experience. Most scripts just erase the comments available directly through your reddit profile, which is limited to the most recent ~2000 posts that you’ve made. To fully erase anything and everything, you need to request all your data from reddit, download the .zip and feed it into an application like shreddit.

permalink
report
parent
reply
7 points
*
Deleted by creator
permalink
report
parent
reply
2 points

Presumably most of the current AI models have already had access to reddit data in the past, so I am a bit confused about why they would pay 60 million for it now.

permalink
report
parent
reply
4 points

Yep used ‘power delete suite’ to delete everything before I left.

permalink
report
parent
reply
3 points

Well, I just discovered a bunch of my stuff had been restored. Says deleted account, but it’s there.

permalink
report
parent
reply
1 point

Deleting your account doesnt delete your content AFAIK.

permalink
report
parent
reply
2 points

I suspect Reddit holds a perfect copy of every edit, including the first, you’ve ever done. For legal reasons if nothing else. Now also to prevent against perfectly good AI training content to be deleted.

permalink
report
parent
reply
1 point
*
Deleted by creator
permalink
report
parent
reply
2 points

Yeah! Here, no one gets paid when someone else wants to profit off of all the free user generated content. Wait, what was our goal again?

permalink
report
parent
reply
1 point
*

I replaced all of my comments with gibberish using a script. Then I deleted them. Then I deleted my account. They’re welcome to train on my account data, because it’s all garbage, even if they go to all of the trouble to restore my deleted comments. I knew they were going to do this crap.

permalink
report
parent
reply

Technology

!technology@beehaw.org

Create post

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

Community stats

  • 2.8K

    Monthly active users

  • 3.4K

    Posts

  • 78K

    Comments