You are viewing a single thread.
View all comments
20 points

I still find astonishing that tech crunch buys the argument of ML model training.

No one in their sane mind would use the API (that have always been rate limited) for fetch data for text generation. People would use HTTP or, even better, archives of reddit.

Why? Because there is better or no rate limit, there is no need to write anything (only reading) and it will stay free 🙂 Also super fresh data is not dramatically useful (except in very specific corner cases when something in the news change the way we talk)

permalink
report
reply
9 points
*

Web crawling has always worked through raw HTTP/HTML parsing, why create site specific API calls that require authentication and are throttled.

This excuse is pure bullshit.

permalink
report
parent
reply
2 points

Another proof of Reddit’s incompetence.

permalink
report
parent
reply
5 points

Considering the Reddit API has a hilariously low limit, I fully understand why the AI bro’s will use a scraping approach instead. I’ve built small discord bots that had a difficult time following the API because you had so little Requests available! I was in the process of building an event-driven system which used multiple API tokens in order to be able to keep up with multiple feeds. Its just terrible.

permalink
report
parent
reply

Community stats

  • 335

    Monthly active users

  • 649

    Posts

  • 12K

    Comments

Community moderators