I feel like this move has nothing to do with investors and everything to do with setting the standard for big corps like Microsoft and Google to be able to scrape their massive amount of data to train next gen AIs. They know they have HUGE amount of data from now and for years and years ago. Content, created by others, then sold for enormous profit.
I mean AI is already stealing all art and images on the web without paying anything. They could just literally scrape and pay nothing. Web scraping isn’t illegal, they already do it, why would they pay anyone? Unless the law catches up about the rights to manufacture AI content based on ill-gotten data, then why would they pay what they don’t have to?
Could you please point me to legal definitions, in court or otherwise, that say it is not violating my copyright license to directly use my artwork in any shape or form for a non-fair use product? As in, a service you pay money for to create things based on the training data it has taken from me, is not fair use. Or point me to the legal definitions where I lose my copyright by posting things online? Allowing to scrape is not the same thing as giving derivative copyright license permissions. You aren’t disagreeing with me, you’re disagreeing with my legal rights.
What do you mean by stealing? The data remains, all they do is learning from something which is public
What is different to Googles approach, they are just watching and learning Why is it treated so differently when it essentially does nothing new, but uses the data in a different way
they are just watching and learning Why is it treated so differently
Because it isn’t human. It isn’t watching and learning, it is being fed my creative content as data that I have not allowed nor have been compensated for, which is then turned around and sold as a service. My work is being consumed for commercial uses by an inhuman who does not have fair use education rights, with the sole intent to create a profitable product, and I’m getting nothing. I have legal rights, no matter where I post my work, to retain my copyrights and I have the right to not consent to improper use of my works that do not align with the licenses I have chosen to give it. Websites ask for a licenses in their ToS to be able to even just display and share my artwork when I upload it. When I create an image, I am given ownership of it’s copyright to control the use, distribution, and right to create derivatives. This isn’t a fuzzy area, it’s very clear. If an artist did not consent to their artwork being used as training data for a non-fair use reason, it is stealing their works.
And no, it’s not fair use under education. Copyright exists for human protection and uses. It isn’t being used for ‘learning’ it’s used as data to be repackaged and sold. Google’s use of it showing up in search is to link back to posts that contain my work, retain my copyright, and are not derivatives. If you mean by captchas, yeah capchas are pretty bullshit.
Interesting. I wonder if they already got an offer that matches their new API pricing, and they decided to up everything to match that cost and avoid being sued later.
Like, there seems to be some urgency between them announcing and upping the price. What was it? Is this the reason? A confirmed, extremely wealthy and extremly naive buyer?
The API pricing likely has something to do with the revenue per user calculations. Reddit is aiming for IPO but their valuation tanked over the past two years. That might be why the admins have decided to strong arm this with such a short notice.
If he thinks locking down the API is going to stop them, he’s bumped his head. These companies have more than enough manpower to write and maintain an HTML scraper for Reddit.
Creating a web scraper vs actually maintaining one that is effective and works is two different things. It’s very easy to fight web scraping if you know what you are doing.
Right, but these are big companies with lots of talented programmers on hand. If anyone can overcome such an obstacle, it’s them.
Also, Google and Microsoft already have a search index full of Reddit content to scrape.
The thing I worry about whenever someone mentions this angle: What about Lemmy content? As the community moves away from the commercial platforms in favor of Lemmy, Bluesky, Mastodon etc. Then does that lower the legal barrier for AI companies to train on all this content for free? Is that shift in the legal vulnerability of public content something that users consider? Is that desirable to most users? Are people thinking about that?
Open source and federation mean open source and federation, I don’t see why it shouldn’t be free and legal to scrape for Lemmy and Mastadon. However maybe the servers could issue rate limits and suspicious block lists so they don’t go down due to scrapes.
What I don’t understand is why Reddit didn’t institute the following: All api requests are free up to 100,000 per month per user token. Also in our terms of service you can not use us to train AI models without paying this fee.