My first experience with Lemmy was thinking that the UI was beautiful, and lemmy.ml (the first instance I looked at) was asking people not to join because they already had 1500 users and were struggling to scale.
1500 users just doesn’t seem like much, it seems like the type of load you could handle with a Raspberry Pi in a dusty corner.
Are the Lemmy servers struggling to scale because of the federation process / protocols?
Maybe I underestimate how much compute goes into hosting user generated content? Users generate very little text, but uploading pictures takes more space. Users are generating millions of bytes of content and it’s overloading computers that can handle billions of bytes with ease, what happened? Am I missing something here?
Or maybe the code is just inefficient?
Which brings me to the title’s question: Does Lemmy benefit from using Rust? None of the problems I can imagine are related to code execution speed.
If the federation process and protocols are inefficient, then everything is being built on sand. Popular protocols are hard to change. How often does the HTTP protocol change? Never. The language used for the code doesn’t matter in this case.
If the code is just inefficient, well, inefficient Rust is probably slower than efficient Python or JavaScript. Could the complexity of Rust have pushed the devs towards a simpler but less efficient solution that ends up being slower than garbage collected languages? I’m sure this has happened before, but I don’t know anything about the Lemmy code.
Or, again, maybe I’m just underestimating the amount of compute required to support 1500 users sharing a little bit of text and a few images?
The numbers are a little higher than you mention (currently ~3.2k active users). The server isn’t very powerful either, it’s now running on a dedicated server with 6 cores/12 threads and 32 gb ram. Other public instances are using larger servers, such as lemmy.world running on a AMD EPYC 7502P 32 Cores “Rome” CPU and 128GB RAM or sh.itjust.works running on 24 cores and 64GB of RAM. Without running one of these larger instances, I cannot tell what the bottleneck is.
The issues I’ve heard with federation are currently how ActivityPub is implemented, and possibly the fact all upvotes are federated individually. This means every upvote causes a federation queue to be built, and with a ton of users this would pile up fast. Multiply this by all the instances an instance is connected to and you have an exponential increase in requests. ActivityPub is the same protocol used by other federated servers, including Mastodon which had growing pains but appears to be running large instances smoothly now.
Other than that, websockets seem to be a big issue, but is being resolved in 0.18. It also appears every connected user gets all the information being federated, which is the cause for the spam of posts being prepended to the top of the feed. I wouldn’t be surprised if people are already botting content scrapers/posters as well, which might cause a flood of new content which has to get federated which causes queues to back up; this is mostly speculation though.
As it goes with development, generally you focus on feature sets first. Optimization comes once you reach a point a code-freeze makes sense, then you can work on speeding things up without new features breaking stuff. This might be an issue for new users temporarily, but this project wasn’t expecting a sudden increase in demand. This is a great way to show where inefficiencies are and improve performance is though. I have no doubt these will be resolved in a timely manner.
My personal node seems to use minimal resources, not having even registered compared to my other services. Looking at the process manager the postgres/lemmy backend/frontend use ~250MB of RAM.
For now, staying off lemmy.ml and moving communities to other instances is probably best. The use case of large instances anywhere near the scale of reddit wasn’t the goal of the project until reddit users sought alternatives. We can’t expect to show up here and demand it work how we want without a little patience and contributing.
Yup was just typing a comment to basically this effect. Federation adds a ton of overhead – you can still do things fairly efficiently, but every interaction having to fan out to (and fan in from!) many servers instead of like a single RDBMS is gonna cost you.
In all likelihood the code is not as efficient as it could be, but usually you get time to work those out gradually. A giant influx of users quickly turns “TODO: fix in the next six months” into “Oh god the servers are melting fuck fuck.”
That said, assuming the devs can get over this hump, I suspect using a compiled language will pay off long-term. Sure things will still be primarily IO-bound, but making things less CPU-bound is usually a good thing.
For some illustrative examples: Mastodon is in Ruby and hits dumb scaling limitations far more often than other fedi microblogs. Pleroma/Akkoma are Elixir (and BEAM is super well optimized for fast message passing/scaling/IO), Calckey (primarily Typescript) is moving some code to Rust, GoToSocial (Golang) is able to run in a fraction of the resources of Mastodon. The admins of one of the bigger tech instances recently announced they’re basically giving up on administrating Mastodon and are instead going to write a new server from scratch in a compiled language because it’s easier for them than scaling a Rails monolith.
TL;DR everything is IO-bound til it’s not.
I’m pretty sure the fediverse needs a new kind of node at some point. If we assume, that almost every larger instance is connected to almost every other larger instance directly, then there’s a ton of duplicated and very small messages.
There needs to be some kind of hub in-between to aggregate and route this avalanche. Especially if, like you wrote, every upvote is a message, the overhead (I/O, unmarshalling, etc) is huge.
You mean like centralizing the fediverse? Who hosts the hub? Who maintains it? In which country? Who pays for it?
Not a single hub, multiple ones.
Anyone can host a hub, federated instances can negotiate the intersection of hubs they both trust and then send traffic that way. That could mean, a single comment might be sent to, say, five hubs and each hub then forwards to 50 instances or so.
Since the hubs are rather simple, they can scale very easily and via cryptographic ratchets, all instances can make sure, they received the correct messages.
An old Chromebox G1 (i7-4600U and 8GB of ram) with a 128GB internal NGFF and external 1TB NVMe. It’s by no means powerful or expensive hardware. I’m also only receiving federated posts for my subscribed communities, and sending out these comments I write so it’s a lot lower workload than the larger instances.
Hosting a personal instance is the direction I want to go. That’ll be a winter project, though. It’s fishing/swimming/rowing/woodworking season now :)
Are there any general tips or hidden dangers you can share? Or is it as straightforward as just putting my nose into documentation and playing around a bit?
I can’t imagine what possessed them to use websockets other than “gee whiz websockets.”
Probably the same appeal as rust though: gee whiz.
It could be the devs just like programming in Rust. It’s a nice language lol
Hi, programming.dev owner here. From what I’ve been seeing it’s a lot of memory issues. We were hitting swap which was causing massive disk io. You can see what happened with the disk io immediately after the upgrade to more memory. I know at least one reason is being resolved in this PR
We were also having issues with the nginx config. There were some really weird settings that I don’t think were necessary. Finally, the federation is quite busy. So if someone subscribes to events from 10 different servers, we pull in every single event, even upvotes. There’s currently a lot of work being done around this stuff.
I don’t think Rust is the problem. I think it’s just a growth thing. Every platform has growth challenges, things grow in ways that you never expect. You might have thought that it was going to be IO constrained due to the federation, but in reality it’s memory constrained because memory is actually the most expensive thing to have on a server. etc.
So if someone subscribes to events from 10 different servers, we pull in every single event, even upvotes. There’s currently a lot of work being done around this stuff.
You mean like coalescing multiple events into a single message, or…? (I don’t know anything about ActivityPub, so apologies if this is a stupid question!)
correct. I’ve been looking for the thread to try and find it for you, but haven’t been having any luck. People have been discussing exactly that though, but it seems like it could cause some problems with vote faking. Anyway, it is being worked on!
Any reason to use nginx versus something like Envoy? Like, I really like nginx, but Envoy’s xDS API is really great for on-the-fly changes. I also think it might scale better and have more relevant default values. I’m just not sure if Lemmy ties into nginx in some way, or if you’re purely using it as a reverse proxy.
I’ll note that most of my Envoy experience is from using it with k8s and a custom ingress controller, where my org handles millions of requests per second (across many Envoy pods). Deploying it standalone might make it less fun.
Nginx is part of the lemmy-ansible install. I’ve never heard of Envoy though. if you’re interested in helping out you can always join the discord. We also set up a matrix room, but it doesn’t have as much discussion in it yet. https://app.element.io/#/room/!hmRRJzTsXkNAGIDXNu:matrix.org
I would say that it’s extremely unlikely.
Websites in general are never limited by raw code execution, they are mostly limited by IO. Be that disk IO as files are read and written, database IO as you need to execute complex queries to gather all the data to build the user timeline, and network IO to transfer data to and from the user. For decentralized social media like Kbin or Lemmy its even more IO limited as each instance needs to go back and forth to other instances to keep up-to-date data.
Websites usually benefit much more from caching and in-memory databases to keep frequently used data in fast storage.
This is why simple, high level, object oriented, garbage collected languages have become so common. All the CPU performance penalties they incur don’t actually affect the website performance.
Not relevant to lemmy (yet), but this does break down a bit at very large scales. (Source: am infra eng at YouTube.)
System architecture (particularly storage) is certainly by far the largest contributor to web performance, but the language of choice and its execution environment can matter. It’s not so important when it’s the difference between using 51% and 50% of some server’s CPU or serving requests in 101 vs 100 ms, but when it’s the difference between running 5100 and 5000 servers or blocking threads for 101 vs 100 CPU-hours per second, you’ll feel it.
Languages also build up cultures and ecosystems surrounding them that can lend themselves to certain architectural decisions that might not be beneficial. I think one of the major reasons they migrated the YouTube backend from Python to C++ isn’t really anything to do with the core languages themselves, but the fact that existing C++ libraries tend to be way more optimized than their Python equivalents, so we wouldn’t have to invest as much in developing more efficient libraries.
It is fairly relevant to lemmy as is. Quite a few instances have ram constraints and are hitting swap. Consider how much worse it would be in python.
Currently most of the issues are architectural and can be fixed with tweaking how certain things are done (i.e., image hosting on an object store instead of locally).
In lemmy’s case, my perusal of the DB didn’t really suggest that the queries would be that complex and I suspect that moving it to a higher performance NoSQL DB might be possible, but I’d have to take a look at a few more queries to be sure.
I wonder if this could be made to work with Aerospike Community Edition…
Obviously it could be more effort than it’s worth though.
The issues I’ve seen more are around images. Hosting the images on an object store (cloudflare r2, s3) and linking there would reduce a lot of federation bandwidth, as that’s probably cause higher ram/swap usage too.
pict-rs supports storing in object stores, but when getting/serving images, it still serves through the instance as the bottleneck IIRC. That would do quite a bit to free up some resources and lower overall IO needed by the server.
There’s no need to migrate the database, that shouldn’t be an issue at this size. Caching should be implemented as another comment suggested.
Would you be so kind as to recommend some resources about caching? I’ve read the basics, but have yet to dive deep on it
Oh shit does lemmy not have response caching? Yeah, that’s gonna be an issue pretty soon.
Ehhhhhhh. Using a relational database for Lemmy was certainly a choice, but I don’t think it’s necessarily a bad one.
Within Lemmy, by far the most expensive part of the database is going to be comment trees, and within the industry the consensus on the best database structure to represent these is… well, there isn’t one. The efficiency of this depends way more on how you implement it within a given database model than on the database model itself. Comment trees are actually a pretty difficult problem; you’ll notice a lot of platforms have limits on comment depth, and there’s a reason for that. Getting just one level of replies to work efficiently can be tricky, regardless of the choice of DBMS.
Looking at the schema Lemmy uses, I see a couple opportunities to optimize it down the road. One of the first things I noticed is that comment replies don’t seem to be directly related back to the top-level post, meaning you’re restricted to a breadth-first search of the comment tree at serving time. Most comments will be at pretty shallow depths, so it sometimes makes sense to flatten the first few levels of this structure so you can get most relevant comments in a single query and rebuild the tree post-fetching. But this makes nomination (i.e. getting the “top 100” or whatever comments to show on your page) a lot more difficult, so it makes sense that it’s currently written the way it is.
If it’s true (as another commenter said) that there’s no response caching for comment queries, that’s a much bigger opportunity for optimization than anything else in the database.
That’s correct. I wonder if YouTube still uses Python to this day (seems like they migrated to C++?)
Not saying there isn’t a difference in language performance, but for most world problems the architecture and algorithms matter more than the language for performance. Unless you’re in a very constrained environment such as lower end smartphones or embedded systems.
Also this makes you think (assuming it’s true lol): https://www.reddit.com/r/Lemmy/comments/14h965f/comment/jpdemet
I think the devs openly stated they aren’t backend bods and asked for help optimising the database as a priority. There’s a bit of work going on on github to sort that out I think. Anyone reading this who can optimise postgresql or contribute to a database agnostic retool should probably speak to the devs as I imagine you’d be welcome.
I wish I could help so much but I doubt they’re going to retool into .net haha.
Which is fine. If they wanted to learn Rust and wrote inefficient code, good for them. I appreciate their efforts. Rust can certainly be beaten into shape and perform well enough in the end.
Rust itself or the way the Rust logic is implemented is not the bottleneck. Like most decent web applications the bottleneck is the database and how the decentralized protocols themselves are reconciled there.
Scaling massive amounts of records like Lemmy has been forced to is almost always IO bound at the database level even when a web service is centralized; this is much more difficult in federated architectures. This is why “NoSQL” databases have increased in popularity, but they are also not a magic bullet as there are major ACID trade offs one needs to consider.
NoSQL databases are no silver bullet and the costs of ACID are usually exaggerated (plus most NoSQL databases actually implement ACID anyway). NoSQL databases and SQL databases often have similar performance characteristics since most of the technology is typically the same under the hood.
Plus from my experience as a database consultant: databases are rarely IO bound, NoSQL or SQL unless you have a strange workload. Most time for query execution is usually spent waiting on loads or executing CPU instructions, not waiting on disk IO.