You are viewing a single thread.
View all comments
39 points

I feel like this move has nothing to do with investors and everything to do with setting the standard for big corps like Microsoft and Google to be able to scrape their massive amount of data to train next gen AIs. They know they have HUGE amount of data from now and for years and years ago. Content, created by others, then sold for enormous profit.

permalink
report
reply
24 points

I mean AI is already stealing all art and images on the web without paying anything. They could just literally scrape and pay nothing. Web scraping isn’t illegal, they already do it, why would they pay anyone? Unless the law catches up about the rights to manufacture AI content based on ill-gotten data, then why would they pay what they don’t have to?

permalink
report
parent
reply
3 points
*
Deleted by creator
permalink
report
parent
reply
1 point

Could you please point me to legal definitions, in court or otherwise, that say it is not violating my copyright license to directly use my artwork in any shape or form for a non-fair use product? As in, a service you pay money for to create things based on the training data it has taken from me, is not fair use. Or point me to the legal definitions where I lose my copyright by posting things online? Allowing to scrape is not the same thing as giving derivative copyright license permissions. You aren’t disagreeing with me, you’re disagreeing with my legal rights.

permalink
report
parent
reply
1 point
*

What do you mean by stealing? The data remains, all they do is learning from something which is public

What is different to Googles approach, they are just watching and learning Why is it treated so differently when it essentially does nothing new, but uses the data in a different way

permalink
report
parent
reply
4 points
*

they are just watching and learning Why is it treated so differently

Because it isn’t human. It isn’t watching and learning, it is being fed my creative content as data that I have not allowed nor have been compensated for, which is then turned around and sold as a service. My work is being consumed for commercial uses by an inhuman who does not have fair use education rights, with the sole intent to create a profitable product, and I’m getting nothing. I have legal rights, no matter where I post my work, to retain my copyrights and I have the right to not consent to improper use of my works that do not align with the licenses I have chosen to give it. Websites ask for a licenses in their ToS to be able to even just display and share my artwork when I upload it. When I create an image, I am given ownership of it’s copyright to control the use, distribution, and right to create derivatives. This isn’t a fuzzy area, it’s very clear. If an artist did not consent to their artwork being used as training data for a non-fair use reason, it is stealing their works.

And no, it’s not fair use under education. Copyright exists for human protection and uses. It isn’t being used for ‘learning’ it’s used as data to be repackaged and sold. Google’s use of it showing up in search is to link back to posts that contain my work, retain my copyright, and are not derivatives. If you mean by captchas, yeah capchas are pretty bullshit.

permalink
report
parent
reply
11 points

The thing I worry about whenever someone mentions this angle: What about Lemmy content? As the community moves away from the commercial platforms in favor of Lemmy, Bluesky, Mastodon etc. Then does that lower the legal barrier for AI companies to train on all this content for free? Is that shift in the legal vulnerability of public content something that users consider? Is that desirable to most users? Are people thinking about that?

permalink
report
parent
reply
2 points

Open source and federation mean open source and federation, I don’t see why it shouldn’t be free and legal to scrape for Lemmy and Mastadon. However maybe the servers could issue rate limits and suspicious block lists so they don’t go down due to scrapes.

What I don’t understand is why Reddit didn’t institute the following: All api requests are free up to 100,000 per month per user token. Also in our terms of service you can not use us to train AI models without paying this fee.

permalink
report
parent
reply
1 point

I’m with you on that. AI is the future. Just because xxx big corp is doing AI training for their closed source product doesnt mean that open source models won’t also benefit. If you post to a public space you should expect it to be read.

permalink
report
parent
reply
7 points

That’s an interesting but definitely plausible take on the whole thing. 12000$ for 50mio requests is B2B pricing. For a company like Openai/Microsoft that’s not even worth thinking about if you get all of that precious training data for it…

permalink
report
parent
reply
4 points

Interesting. I wonder if they already got an offer that matches their new API pricing, and they decided to up everything to match that cost and avoid being sued later.

Like, there seems to be some urgency between them announcing and upping the price. What was it? Is this the reason? A confirmed, extremely wealthy and extremly naive buyer?

permalink
report
parent
reply
2 points

The API pricing likely has something to do with the revenue per user calculations. Reddit is aiming for IPO but their valuation tanked over the past two years. That might be why the admins have decided to strong arm this with such a short notice.

permalink
report
parent
reply
4 points

If he thinks locking down the API is going to stop them, he’s bumped his head. These companies have more than enough manpower to write and maintain an HTML scraper for Reddit.

permalink
report
parent
reply
2 points

Man, even I can do that and I’m hardly a programmer.

permalink
report
parent
reply
2 points

That opens them up to massive legal problems if they do. AI companies are going to need to prove their training data was legit obtained.

permalink
report
parent
reply
1 point

Creating a web scraper vs actually maintaining one that is effective and works is two different things. It’s very easy to fight web scraping if you know what you are doing.

permalink
report
parent
reply
1 point
*

Right, but these are big companies with lots of talented programmers on hand. If anyone can overcome such an obstacle, it’s them.

Also, Google and Microsoft already have a search index full of Reddit content to scrape.

permalink
report
parent
reply
3 points

If big tech trains AI using reddit interactions, out species is doomed 🤣

permalink
report
parent
reply
1 point
*
Deleted by creator
permalink
report
parent
reply

Memes

!memes@lemmy.ml

Create post

Rules:

  1. Be civil and nice.
  2. Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.

Community stats

  • 8.5K

    Monthly active users

  • 13K

    Posts

  • 288K

    Comments