How will lemmy scale?

posted 1 year ago

I already get rate-limited like crazy on lemmy and there are only like 60,000 users on my instance. Is each instance really just one server or are there multiple containers running across several hosts? I’m concerned that federation will mean an inconsistent user experience. Some instances many be beefy, others will be under resourced… so the average person might think Lemmy overall is slow or error-prone.

Reddit has millions of users. How the hell is this going to scale? Does anyone have any information about Lemmy’s DB and architecture?

I found this post about Reddit’s DB from 2012. Not sure if Lemmy has a similar approach to ensure speed and reliability as the user base and traffic grows.

https://kevin.burke.dev/kevin/reddits-database-has-two-tables/

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ - ]

weeezes@sopuli.xyz

0 points

1 year ago

The part that worries me about scalability in the long term is the push nature of ActivityPub. My server is already getting several POST requests to /inbox per second already, which makes me wonder how that’s gonna work if big instances have to push content updates to thousands of lemmy instances where most of the data probably isn’t even seen. I was surprised it was a push system and not a pull system, as pull is much easier to scale and cache at the CDN level, and can be fetched on demand for people that only checks lemmy once in a while.

I think there’s a benefit to the push model, as the instances can prioritize who to push to first if there’s scaling issues, instead of having to throttle GETs, effectively the end result is anyway the same that nothing ends up to other instances in real time (which is fine). I don’t know how lemmy works exactly, but could the push model just be a detail of activitypub https://flak.tedunangst.com/post/what-happens-when-you-honk ?

permalink

report

parent

[ - ]

Max-P@lemmy.max-p.me

1 point

1 year ago

It definitely is part of the ActivityPub protocol, but I only glanced at the spec so far. I should probably follow that link and implement a toy ActivityPub app to get more familiar with how it works.

I think there’s a benefit to the push model, as the instances can prioritize who to push to first if there’s scaling issues, instead of having to throttle GETs,

The downside to this is smaller instances are penalized in that scenario, which would in turn could cause users to flock to megainstances until it becomes centralized again.

As I said, GETs are cacheable, so if one slaps Cloudflare in front you can handle millions of GETs for relatively cheap.

Maybe it’s batched however? I really need to read the spec. Pushing to thousands of servers every 1/5/10 minutes certainly would give a fair amount of headroom to make it work I guess.

permalink

report

parent

Asklemmy

!asklemmy@lemmy.ml

Create post

A loosely moderated place to ask open-ended questions

Search asklemmy 🔍

If your post meets the following criteria, it’s welcome here!

Open-ended question
Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
Not ad nauseam inducing: please make sure it is a question that would be new to most members
An actual topic of discussion

Looking for support?

Looking for a community?

Lemmyverse: community search
sub.rehab: maps old subreddits to fediverse options, marks official as such
!lemmy411@lemmy.ca: a community for finding communities

_Icon _by _{@Double_A@discuss.tchncs.de}

Community stats

10K
Monthly active users
5.9K
Posts
319K
Comments

Community stats

Community moderators