Lemmy Federation Architecture Change Proposal(github.com)

posted 1 year ago

HTTP_404_NotFound@lemmyonline.com

technology@beehaw.org

62 commentshide report

https://github.com/LemmyNet/lemmy/issues/3245

I posted far more details on the issue then I am putting here-

But, just to bring some math in- with the current full-mesh federation model, assuming 10,000 instances-

That will require nearly 50 million connections.

Each comment. Each vote. Each post, will have to be sent 50 million seperate times.

In the purposed hub-spoke model, We can reduce that by over 99%, so that each post/vote/comment/etc, only has to be sent 10,000 times (plus n*(n-1)/2 times, where n = number of hub servers).

The current full mesh architecture will not scale. I predict, exponential growth will continue to occur.

Let’s work on a solution to this problem together.

Sort:

Hot Top Controversial New Old

[ - ]

Da_Boom@iusearchlinux.fyi

4 points

1 year ago

Nice but it kinda breaks the point of federation - who’s running the hubs? If it’s a company, then nice, we’re back to appeasing our corporate overlord. I

If it’s a secondary federated system, nice now you’re just needlessly complicating things, as anyone could create a hub - you could end up with a lot of single spoked hubs.

permalink

report

[ - ]

Hotzilla@sopuli.xyz

1 point

1 year ago

Load wise, having the hub separated to own server would make scaling easier. So even one hub and one instance solution for large instances could work. For personal instances this solution would be nice, because they could share one hub, and federate through that.

No one is suggesting here to have any company host the hub.

permalink

report

parent

[ - ]

andscape@feddit.it

5 points

1 year ago

Other people in the thread have already made this point: even with a full mesh network, the number of remote calls made for a single activity is equal to the number of instances subscribing to that activity (plus one if the activity originates from an instance that’s not the host of the activity).

A hub/spoke model doesn’t change this, it just moves the load from the host instance to the hub. The number of connections is still the same: if N instances need to receive the activity, N calls will have to be made. If anything this adds 1 more call from the host instance to the hub.

Even peer-to-peer distribution of activities, mentioned by @hazelnoot@beehaw.org, wouldn’t actually change the amount of calls being made. You still have N servers that have to receive the activity, so you need at least N calls overall. What this would do is redistribute the load better over instances, so the host doesn’t have to make all N calls. It would definitely be an improvement, but it would not be easy to implement successfully, and it would almost surely break ActivityPub compatibility.

The only thing I can think of that would actually reduce the overall network load, though, is batching: sending multiple activities/updates together in a single message. AFAIK this is not supported by ActivityPub, though, so implementing it would mean breaking compatibility, and also implementing an entirely updated version of the protocol (which is a massive undertaking).

permalink

report

[ - ]

HTTP_404_NotFound@lemmyonline.comOP

0 points

1 year ago

My logic, was the move the load away from the primary instance server, onto a service/server that only focuses on handling federation duties.

My reasoning- is to break apart the two workloads, and hopefully build a more scalable federation tier, that can scale independently on the primary instance server.

permalink

report

parent

[ - ]

andscape@feddit.it

2 points

1 year ago

I understand the logic, and you’re right to think about how improve Lemmy’s scalability. But I’m not sure if this is the way to go.

If you build a dedicated federation proxy for an instance, you’ve really just slightly moved the problem. The federation proxy is going to have the same scalability issues, and if anything the total load goes up.

If you build multi-instance hubs, you suddenly introduce a lot of new issues.

Security: I think Lemmy checks the source of an update to verify that it comes from the legitimate host. You would have to introduce some kind of signatures to verify that the activity originated from the legitimate host.
Privacy: now your users have to trust the hub owners with their data, not just the instance.
Motive: who would be running the hubs, and why? They would have to be even bigger that the instances, and there would be much less incentive to do it.

permalink

report

parent

[ - ]

HTTP_404_NotFound@lemmyonline.comOP

0 points

1 year ago

I would agree with all of your above points.

That said- https://github.com/LemmyNet/lemmy/issues/3245 The most recent idea has popped up a few times on both lemmy, and now github- and actually sounds like a potential solution as well.

Just- using signed messages between instances, which can be transmitting P2P, instead of direct only.

permalink

report

parent

Show more comments

[ - ]

binwiederhier@discuss.ntfy.sh

41 points

1 year ago

You got a lot of heat in this discussion, but let me be one of the few to applaud you for actually making a proposal. Saying No is easy, but suggesting something and writing it down and putting it out there is hard.

I am a Principal Engineer by trade, and i do what you did here all the time. I put out suggestions to my team and let them absolutely wreck it. This is how you advance and enhance your idea. Listen and learn from the feedback and suggest another thing based on what you have learned. Rinse and repeat.

That’s how you get to a great proposal. Keep at it. Well done.

permalink

report

[ - ]

useful_idiot@lemmy.eatsleepcode.ca

4 points

1 year ago

And at the very least, there’s a record of the discussions and thought processes behind why this was or wasn’t chosen.

permalink

report

parent

[ - ]

HTTP_404_NotFound@lemmyonline.comOP

11 points

1 year ago

I put out suggestions to my team and let them absolutely wreck it.

I know the feeling- I am used to it. My day job is being a combination consultant, project manager. (With some software dev, every now and then). I get to sit down and help design and architect things, and solve problems. There generally isn’t a solution everyone likes or agrees with, but, if you can check off more issues than you cause- it’s generally a step in the right direction.

People are absolutely stomping on the idea for the most part, but, I do think a few good points have came out of the discussion.

And- a few good points, is better than no points at all!

permalink

report

parent

[ - ]

Fauxreigner@beehaw.org

2 points

1 year ago

Yeah, as one of the people giving you “heat”, I think it’s great that you put forward this proposal and took feedback on it, and my goal was a productive discussion, not shaming. Not every idea is good, and very, very few of them are good right from the jump, but you never find good ideas without putting them out there.

permalink

report

parent

[ - ]

th3raid0r@tucson.social

3 points

1 year ago

I’ll be the odd one out and say I support this model but for other reasons than the technical limitations and scaling problems involved. For me it’s more about trying to establish a tighter ring of trust and enable easier user onboarding as the hub could serve as the primary identity store for users on multiple instances.

I mentioned it in some chat earlier, but I think that the Beehaw.org moderation model, goals, and philiosophy serves as an excellent starting point for like-minded communities to build out the hub-and-spoke. It would also give them greater flexibility in maintaining the health of their corner of the fediverse by centralizing identity with them.

This model would, of course, not stop others from creating their own hub and spoke and would break apart the fediverse a bit, so I suppose there should be a way for “hubs” to talk to eachother in a way that resembles what we have now.

From a blocking bad actors standpoint (I’m still upset about Captcha getting removed even if it’s a technically inferior solution), it would be far easier to have fewer hubs to need to blacklist/whitelist than having to do it for each individual instance.

I guess to go a bit further, if Lemmy could support both “modes” (as in it can be configured to be hub and spoke as either the hub or spoke, as well as retain the existing functionality for those who don’t want a hub) that would be ideal.

permalink

report

[ - ]

HTTP_404_NotFound@lemmyonline.comOP

2 points

1 year ago

A bit of centralized spam management wouldn’t be a bad idea at all.

permalink

report

parent

[ - ]

key@lemmy.keychat.org

13 points

1 year ago

Activities aren’t sent on every “connection” in the network in the current model. There isn’t indirect transmission nor polling so even though there’s a theoretical 50 mil connections in the scenario you gave, any one activity will already only be sent up to 10k times. That’s why instances require TLS and being internet accessible, so they can receive direct communication. I agree with you that there’s some difficult scaling issues with federation but your representation of it is inaccurate.

permalink

report

Technology

!technology@beehaw.org

Create post

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

Community stats

2.8K
Monthly active users
3.4K
Posts
82K
Comments

Community stats

Community moderators