You are viewing a single thread.
View all comments View context
4 points

If he thinks locking down the API is going to stop them, he’s bumped his head. These companies have more than enough manpower to write and maintain an HTML scraper for Reddit.

permalink
report
parent
reply
2 points

Man, even I can do that and I’m hardly a programmer.

permalink
report
parent
reply
2 points

That opens them up to massive legal problems if they do. AI companies are going to need to prove their training data was legit obtained.

permalink
report
parent
reply
1 point

Creating a web scraper vs actually maintaining one that is effective and works is two different things. It’s very easy to fight web scraping if you know what you are doing.

permalink
report
parent
reply
1 point
*

Right, but these are big companies with lots of talented programmers on hand. If anyone can overcome such an obstacle, it’s them.

Also, Google and Microsoft already have a search index full of Reddit content to scrape.

permalink
report
parent
reply
1 point

You are right. You would need a team of skilled scrapers and network engineers though would know how to get around rate limiters with some kind of external load balancer or something along those lines.

permalink
report
parent
reply

Memes

!memes@lemmy.ml

Create post

Rules:

  1. Be civil and nice.
  2. Try not to excessively repost, as a rule of thumb, wait at least 2 months to do it if you have to.

Community stats

  • 8.5K

    Monthly active users

  • 13K

    Posts

  • 288K

    Comments