lemm.ee

Local All Communities Log in Sign up

Local All Communities

52

How will lemmy scale?

posted 1 year ago

by

phoneymouse@lemmy.world

in

asklemmy@lemmy.ml

I already get rate-limited like crazy on lemmy and there are only like 60,000 users on my instance. Is each instance really just one server or are there multiple containers running across several hosts? I’m concerned that federation will mean an inconsistent user experience. Some instances many be beefy, others will be under resourced… so the average person might think Lemmy overall is slow or error-prone.

Reddit has millions of users. How the hell is this going to scale? Does anyone have any information about Lemmy’s DB and architecture?

I found this post about Reddit’s DB from 2012. Not sure if Lemmy has a similar approach to ensure speed and reliability as the user base and traffic grows.

https://kevin.burke.dev/kevin/reddits-database-has-two-tables/

Sort:

Hot Top Controversial New Old

[ +- ]

Max-P@lemmy.max-p.me

38 points

1 year ago

Bigger instances will indeed run multiple copies of the various components, it’s pretty standard software in that regard.

Usually at first that will start by moving the PostgreSQL database to its own dedicated box, and then start adding additional backend boxes, possibly adding more caching in front so that the backend doesn’t have to do as much work. Once the database is pegged, the next step is usually a write primary and one or more read secondaries. When that gets too much, you get into sharding so that you can spread the database load across multiple servers. I don’t know much about PostgreSQL but I have to assume it’s better than MySQL in that regard and I’ve seen a 1 TB MySQL database in the wild running just fine.

I think lemmy.world in general is hitting some scalability issues that they’re working on. Keep in mind the software is fairly new and is just being truely tested at large scale, there’s probably a ton of room for optimization. Also lemmy.world is still on 0.17 and apparently 0.18 changed the protocol a lot in a way that makes it scale much better, so when they complete that upgrade it’ll probably run a lot better already.

The part that worries me about scalability in the long term is the push nature of ActivityPub. My server is already getting several POST requests to /inbox per second already, which makes me wonder how that’s gonna work if big instances have to push content updates to thousands of lemmy instances where most of the data probably isn’t even seen. I was surprised it was a push system and not a pull system, as pull is much easier to scale and cache at the CDN level, and can be fetched on demand for people that only checks lemmy once in a while.

I need to start digging into Lemmy’s code and get familiar with the internals, still only a couple days in with my private instance.

report

reply

[ +- ]

Artemis@sh.itjust.works

3 points

1 year ago

I don’t have a coding background but this was very informative. Thanks for sharing

report

reply

[ +- ]

2 points

1 year ago

*

The implementation as far as I understand it is plain stupid. It prevents small instances from participating at any significant scale and seems happy to just drop data over the wire without reconciling. Seriously amateurish.

report

reply

[ +- ]

Max-P@lemmy.max-p.me

7 points

1 year ago

I’m not sure about the data being dropped, my instance was misconfigured for a day or two, and as soon as I fixed it, the data came right in. Instances repeatedly trying to push data to my instance is what clued me in that something was missing from my NGINX config. It backfilled pretty fast.

Although I wouldn’t mind if there was a fallback pull mechanism to remediate failed pushes.

report

reply

[ +- ]

2 points

1 year ago

*

Interesting. Curious if you have a better understanding of ActivePub - do you happen to know if the protocol guarantees synchonicity and what mechanism guarantees it?

report

reply

Show more comments

Show more comments

[ +- ]

Schooner@lemmy.ml

1 point

1 year ago

Would it be feasible to change it to a pull system at this point? I don’t think the Lemmy part of that is a problem, but ActivityPub may need to make big changes and I’m not sure how practical that is.

report

reply

[ +- ]

weeezes@sopuli.xyz

0 points

1 year ago

The part that worries me about scalability in the long term is the push nature of ActivityPub. My server is already getting several POST requests to /inbox per second already, which makes me wonder how that’s gonna work if big instances have to push content updates to thousands of lemmy instances where most of the data probably isn’t even seen. I was surprised it was a push system and not a pull system, as pull is much easier to scale and cache at the CDN level, and can be fetched on demand for people that only checks lemmy once in a while.

I think there’s a benefit to the push model, as the instances can prioritize who to push to first if there’s scaling issues, instead of having to throttle GETs, effectively the end result is anyway the same that nothing ends up to other instances in real time (which is fine). I don’t know how lemmy works exactly, but could the push model just be a detail of activitypub https://flak.tedunangst.com/post/what-happens-when-you-honk ?

report

reply

[ +- ]

Max-P@lemmy.max-p.me

1 point

1 year ago

It definitely is part of the ActivityPub protocol, but I only glanced at the spec so far. I should probably follow that link and implement a toy ActivityPub app to get more familiar with how it works.

I think there’s a benefit to the push model, as the instances can prioritize who to push to first if there’s scaling issues, instead of having to throttle GETs,

The downside to this is smaller instances are penalized in that scenario, which would in turn could cause users to flock to megainstances until it becomes centralized again.

As I said, GETs are cacheable, so if one slaps Cloudflare in front you can handle millions of GETs for relatively cheap.

Maybe it’s batched however? I really need to read the spec. Pushing to thousands of servers every 1/5/10 minutes certainly would give a fair amount of headroom to make it work I guess.

report

reply

[ +- ]

HobbitFoot @thelemmy.club

17 points

1 year ago

Poorly. Lemmy will scale poorly.

I won’t be surprised if the larger instances start locking down more as a way to sustain themselves, like restricting communities or only allowing text posts.

report

reply

[ +- ]

nyakojiru@lemmy.world

3 points

1 year ago

Sometimes you have just to accommodate to the situation and keep going until it settles down. The error I think here is thinking something can’t have flaws and issues, even more if it’s not behind a corporations. And no one wants corporations.

report

reply

[ +- ]

HobbitFoot @thelemmy.club

2 points

1 year ago

It isn’t about accommodating to the situation, but planning for long term growth.

Right now, instances of Lemmy don’t have any way to fund server costs other than asking for donations. Outside of Wikipedia, that isn’t a sustainable business model. How is Lemmy supposed to survive if, every time a sub gains critical mass, it shuts down?

report

reply

[ +- ]

ritswd@lemmy.world

2 points

1 year ago

*

planning for long-term growth

Which is part of any scaling effort, and you can’t really guess through predicting and resolving bottlenecks, it takes some serious expertise. And as far as I know, the Lemmy devs have never built a high-scale service before, and I think that is possibly the single biggest risk to the growth and success of the Lemmy project in general.

Source: that’s my job, I’ve been doing that for some of the most high-scale services in the world for about a decade. I absolutely could help, actually I’d love to, but I definitely won’t under current Lemmy leadership, for reasons: https://lemmy.world/comment/596235

report

reply

Show more comments

Show more comments

[ +- ]

notavote@sh.itjust.works

1 point

1 year ago

It is not like any other social network has become sustainable business. Reddit, Twitter, YouTube, FB all are net losers with all trials with and selling user data.

We can safely say that after almost 20 we still don’t have sustainable business model for soc networks.

Let’s try with donations.

report

reply

[ +- ]

Mane25@feddit.uk

3 points

1 year ago

*

Wouldn’t that create a natural balance though? A large instance starts struggling so people are incentivised to move to smaller instances or start new instances and so spread the load more evenly. That’s how it would scale. I’m surprised how many of the larger instances haven’t closed signups yet but that wouldn’t be a bad thing if they did.

report

reply

[ +- ]

HobbitFoot @thelemmy.club

4 points

1 year ago

The issue isn’t on the user end, but the sub end since that is where all the data is stored.

So, according to your proposal, the best thing a sub should do when it is getting popular is to go private with its existing subscribers and any new people who want to participate should go create their own sub in a different instance.

report

reply

[ +- ]

Mane25@feddit.uk

1 point

1 year ago

I wasn’t talking about subs, I’m talking about when an instance gets too popular. Ideally you’d want lots of small instances, ideally communities should be spread evenly as well and if your users are spread out that should happen more or less naturally.

report

reply

Show more comments

Show more comments

[ +- ]

1 point

1 year ago

When the protocol favors monoliths, we’re right back to the Reddit problem

report

reply

[ +- ]

GoodEye8@lemm.ee

4 points

1 year ago

Scalability doesn’t mean “favoring monoliths”. It’s just scalability and honestly, 60k users shouldn’t bring a service down. 60k users is not even close to being a monolithic instance.

report

reply

[ +- ]

1 point

1 year ago

Scalability does mean favoring monoliths because it costs money to scale and scaling here isn’t proportional to your instance’s users, it’s proportional to the size of the entire network.

60k users is today, not tomorrow. I’m thinking forward to 6000k users.

report

reply

[ +- ]

SlovenianSocket@lemmy.ca

16 points

1 year ago

*

As far as I’m aware lemmy does not support load balancing or high availability as it currently stands. But development is still in its infancy and I’m sure that’s a top priority

report

reply

[ +- ]

Freeman@lemmy.pub

6 points

1 year ago

I mean it can….it’s just very DB heavy. It would be on an admin to scale up and scale out a single instance witb multiple dbs, replication etc.

It would be nice to be able to assign dbs to a task (ie: one for federation updates, one for local community posts, one to service web requests. There may be a way to do that already but I’m not aware, it may need to be in code.

Also syncing/federation across instances seems to be a mixed bag. And my instance will sometimes waste threads trying to sync with instances they have come and gone. As a result some communities id love to see updates on don’t come through.

Ideally they figure a way to continue to optimize federation and allow smaller instances to just pick up the load.

Mine is open, but I’m not getting any registration requests. I’m not upset about it but their main join page still seems to optimize for larger instances. It would make more sense to optimize for smaller ones to better distribute load. And focus dev work on better l/smoother syncing between federated instances.

Some locking down is a concern. I would love to see a lemmy of trust group if that came to pass. Where you can join the group and federate. My biggest concern with open federation is the legal risk of things like CSAM or CP getting synced onto your instances even if you have the nsfw box unchecked.

report

reply

[ +- ]

Max-P@lemmy.max-p.me

2 points

1 year ago

I don’t see any reasons why you couldn’t run more copies of the backend and frontend, as long as it uses the database properly. It should scale horizontally decently for a while.

At work I have clusters that runs 40-50 application servers all going to one database and handles millions of requests daily, on a pretty inefficient PHP application. Lemmy being in Rust, it can handle a lot of traffic.

Given the frontend is in nodejs, I suspect we’ll need to scale up the frontend first, which should be no problem at all, just many copies of the frontend to fewer copies of the backend to fewer copies of the database. Maybe slap Cloudflare in front at some point.

It will probably get costly to run before it becomes hard technically to scale up.

report

reply

[ +- ]

notavote@sh.itjust.works

1 point

1 year ago

Additionally, federation messages probably can be easily separated to different server, and is being made much more efficient right now.

report

reply

[ +- ]

AggressivelyPassive@feddit.de

1 point

1 year ago

It’s not only about scaling a single instance, but scaling the fediverse.

Currently, each instance sends all events to all federated instances. That means, essentially each instance needs to store and process a significant part of the entire fediverse. That’s insane and has to be addressed.

report

reply

[ +- ]

ollie@lemmy.world

1 point

1 year ago

I don’t think it officially supports it but it does work! Lemmy.world is currently running on multiple containers load balanced by nginx. look at u/ruud latest post about it

report

reply

[ +- ]

roadrunner_ex@lemmy.ca

13 points

1 year ago

It’s a challenge, for sure. It is known that there are some inefficiencies in the codebase, which are actively being worked on. But besides that, it’s tricky to know where bottlenecks are until the user influx happens, particularly with the novel federation architecture. Maybe it’s impossible to scale, maybe not, but we only now are seeing a testable use case. I would expect optimization work to start bearing fruit, but these thing take time.

report

reply

[ +- ]

1 point

1 year ago

It’s not tricky to think of these problems ahead of time. These are solved design problems. Syncronization is not a ‘novel’ problem.

report

reply

[ +- ]

Iron Lynx@lemmy.world

12 points

1 year ago

My expectation, or at least hope, is that Lemmy will grow horizontally, i.e. more instances for more specialised content, instead of vertically, i.e. more communities in singular, larger instances. Since it’s all federated, you can get to stuff in other instances.

I just had an idea. Let’s compare reddit and lemmy as land use metaphors.

Reddit is like one monolithic megacity. It’s full of communites, some big, encompassing entire neighbourhoods, and others smaller, having one street, one block, maybe even just one building.

Lemmy is like a country, with every instance a city. Some cities are big and varied, others are smaller and specialised, like ones dedicated entirely to fishing or aviation or being German. And you can choose a city to settle in and move between cities for your content. Some cities will be more open to sharing content with residents of other cities, and others will put up bigger restrictions. There are jokes about parts of the userbase on 4chan or Tumblr forming their own subcommunities, and the fediverse allows this in a very material way.

My expectation is that more cities may emerge as people develop more specialised communities. And since there are many cities, there is some resilience in the system. If an instance goes down, you’ve lost one instance. Out of christ knows how many. Chances are some of its content is duplicated across other instances, so nothing of value is lost. Meanwhile, if (/when) Reddit goes down, all of Reddit is gone.

In short, I hope lemmy develops more, smaller, specialised instances over time. Reddit allowed very niche insterests to have a corner, and despite that, I think the fediverse is more suited to allow for that than a centralised service.

report

reply

Asklemmy

!asklemmy@lemmy.ml

A loosely moderated place to ask open-ended questions

Search asklemmy 🔍

If your post meets the following criteria, it’s welcome here!

Open-ended question
Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
Not ad nauseam inducing: please make sure it is a question that would be new to most members
An actual topic of discussion

Looking for support?

!lemmy_support@lemmy.ml
!fediverse@lemmy.ml
!selfhosted@lemmy.world

Looking for a community?

Lemmyverse: community search
sub.rehab: maps old subreddits to fediverse options, marks official as such
!lemmy411@lemmy.ca: a community for finding communities

_Icon _by _{@Double_A@discuss.tchncs.de}

Community stats

10K
Monthly active users
5.9K
Posts
319K
Comments

Community moderators

Evan@lemmy.ml
mekhos@lemmy.ml
tmpod@lemmy.pt
OrangeSlice@lemmy.ml

modlog legal instances join-lemmy.org

lemmy-ui-next v0.11.0 (github)lemmy v0.19.5 (github)