Lemmy Federation Architecture Change Proposal(github.com)

posted 1 year ago

HTTP_404_NotFound@lemmyonline.com

technology@beehaw.org

62 commentshide report

https://github.com/LemmyNet/lemmy/issues/3245

I posted far more details on the issue then I am putting here-

But, just to bring some math in- with the current full-mesh federation model, assuming 10,000 instances-

That will require nearly 50 million connections.

Each comment. Each vote. Each post, will have to be sent 50 million seperate times.

In the purposed hub-spoke model, We can reduce that by over 99%, so that each post/vote/comment/etc, only has to be sent 10,000 times (plus n*(n-1)/2 times, where n = number of hub servers).

The current full mesh architecture will not scale. I predict, exponential growth will continue to occur.

Let’s work on a solution to this problem together.

Sort:

Hot Top Controversial New Old

[ - ]

binwiederhier@discuss.ntfy.sh

41 points

1 year ago

You got a lot of heat in this discussion, but let me be one of the few to applaud you for actually making a proposal. Saying No is easy, but suggesting something and writing it down and putting it out there is hard.

I am a Principal Engineer by trade, and i do what you did here all the time. I put out suggestions to my team and let them absolutely wreck it. This is how you advance and enhance your idea. Listen and learn from the feedback and suggest another thing based on what you have learned. Rinse and repeat.

That’s how you get to a great proposal. Keep at it. Well done.

permalink

report

[ - ]

HTTP_404_NotFound@lemmyonline.comOP

11 points

1 year ago

I put out suggestions to my team and let them absolutely wreck it.

I know the feeling- I am used to it. My day job is being a combination consultant, project manager. (With some software dev, every now and then). I get to sit down and help design and architect things, and solve problems. There generally isn’t a solution everyone likes or agrees with, but, if you can check off more issues than you cause- it’s generally a step in the right direction.

People are absolutely stomping on the idea for the most part, but, I do think a few good points have came out of the discussion.

And- a few good points, is better than no points at all!

permalink

report

parent

[ - ]

Fauxreigner@beehaw.org

2 points

1 year ago

Yeah, as one of the people giving you “heat”, I think it’s great that you put forward this proposal and took feedback on it, and my goal was a productive discussion, not shaming. Not every idea is good, and very, very few of them are good right from the jump, but you never find good ideas without putting them out there.

permalink

report

parent

[ - ]

useful_idiot@lemmy.eatsleepcode.ca

4 points

1 year ago

And at the very least, there’s a record of the discussions and thought processes behind why this was or wasn’t chosen.

permalink

report

parent

[ - ]

bdonvr@thelemmy.club

21 points

1 year ago

But, just to bring some math in- with the current full-mesh federation model, assuming 10,000 instances-

That will require nearly 50 million connections.

Each comment. Each vote. Each post, will have to be sent 50 million seperate times.

Well your whole premise is just utterly wrong.

The way federation actually works:

A user on lemmy.ml subscribes to a community on lemmy.world. Say, !funny@lemmy.world

Assume that this user is the first lemmy.ml user to do so - basically what happens is the lemmy.world community sees that a member of a never before seen instance just subscribed. !funny@lemmy.world then adds lemmy.ml to its list of instances it needs to tell whenever something happens in the community.

No matter how many users of lemmy.ml subscribe, this only happens once.

Now when a user of sh.itjust.works upvotes a post on !funny@lemmy.world, the sh.itjust.works instance then tells !funny@lemmy.world of this change. It accepts the change, then tells everyone on its list of instances that have subscribers on them.

So essentially, sh.itjust.works talks to lemmy.world, lemmy.world tells everyone else. There is no “full mesh”. The instance hosting the community is the “hub”, everything else is a spoke.

So if there’s 10,000 instances, and they all just so happen to have at least one subscriber to some community, each change will be sent out 9,999 times. Your “50 million” premise is just completely wrong and I’m not sure where it’s coming from.

permalink

report

[ - ]

HTTP_404_NotFound@lemmyonline.comOP

4 points

1 year ago

Its not wrong- we just have opposite ideas here-

The 50 million, is based on the formula for a full-mesh network. Where all instances talk to each other. In the case of lemmy, this would be an absolute worst-case scenario, where every instance, is subscribed to a community on every other instance.

In your example of only 10,000 messages, you are assuming that of the 10,000 instances in existence, they are ONLY looking at a single community, on a single server.

Lets say, those 10,000 instances all decide to look at a community on another server. Now you have 20,000 connections.

Lets add another community, hosted on yet another instance. That is 30,000 connections.

TLDR;

My example, is based on worst-case scenario. (A pretty unachievable one at that!)

Your example, is based on best-case scenario.

Realistically, the actual outcome would be somewhere much closer to best-case scenario(As communities seem to lump up on the big servers). However, for planning architecture, you always assume worse-case scenario.

permalink

report

parent

[ - ]

bdonvr@thelemmy.club

21 points

1 year ago

No - you said:

Each comment. Each vote. Each post, will have to be sent 50 million seperate times.

That won’t ever happen. Unless there’s 50 million instances. That’s not worst case, it’s just not a case.

There is no case in the current implementation where any one action is replicated more times than there are total instances.

And it doesn’t matter what “model” you assume, each action will have to federate to each instance eventually. That count is minimally, the total number of instances.

Lets say, those 10,000 instances all decide to look at a community on another server. Now you have 20,000 connections.

Looking does nothing, each instance hosts essentially a copy of the “host instance” for each community. Only interactions (comments, likes, posts, etc) are federated.

permalink

report

parent

[ - ]

HTTP_404_NotFound@lemmyonline.comOP

4 points

1 year ago

for fucks sake, dude, be collaborative, and not defensive. This isn’t reddit, I am not out to attack your karma.

If every instance, hosts a community, and Every other instance, subscribes to every one of those communities, that would lead to a full-mesh between all instances, resulting in worst-case scenario, ie, following the formula I provided for a full-mesh topology.

That is indeed, the worst case scenario, I have provided, explained, and documented in my examples.

If my example is too hard to understand, lets use an easier example

Count the number of instances on https://lemmy.ml/instances

Assume every one of those instances subscribes to !asklemmy.

Now, count the number of instances on https://lemmy.world/instances

Assume, every one of those instances subscribes to !lemmyworld.

Now, count the number of instances on https://beehaw.org/instances

Assume, every one of those instances subscribes to !technology.

It does. not. scale.

permalink

report

parent

Show more comments

[ - ]

key@lemmy.keychat.org

13 points

1 year ago

Activities aren’t sent on every “connection” in the network in the current model. There isn’t indirect transmission nor polling so even though there’s a theoretical 50 mil connections in the scenario you gave, any one activity will already only be sent up to 10k times. That’s why instances require TLS and being internet accessible, so they can receive direct communication. I agree with you that there’s some difficult scaling issues with federation but your representation of it is inaccurate.

permalink

report

[ - ]

Hazelnoot [she/her]@beehaw.org

8 points

1 year ago

The same problem can also be solved with signed messages, like the HTTP Signatures used by Mastodon and most of the other microblogging fedi servers. Signatures allow a message to flow peer-to-peer instead of requiring a direct connection. You would only need a connection when actively interacting with a post on another instance, and its very unlikely that all 10K instances would be interacting with each other. Most likely, the network will consist of smallish groups of loosely-related instances plus a few giant servers that can handle the load of being popular.

permalink

report

[ - ]

HTTP_404_NotFound@lemmyonline.comOP

4 points

1 year ago

That, honestly, wouldn’t be a bad idea either. That should in theory help break up a lot of the load which is currently overly centralized.

The implementation should be a lot easier then my purposed idea as well, and it also has side effects of potentially improving security.

permalink

report

parent

[ - ]

raws@lemmy.sh

5 points

1 year ago

Is this accurate on how it works? My assumption was a user would have to be subscribed to a remote community on their local instance for that local instance to pull posts/votes/comments from the remote instance. It’s not like everything is replicated everywhere.

permalink

report

[ - ]

HTTP_404_NotFound@lemmyonline.comOP

2 points

1 year ago

Your assumption is correct-

I gave worst-case scenario for modeling purposes.

Realistically, the number of connections will be far less, however, do also note, this platform will soon be hosting over one million users. Everything, is going to scale upwards.

permalink

report

parent

Technology

!technology@beehaw.org

Create post

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

Community stats

2.8K
Monthly active users
3.4K
Posts
82K
Comments

Community stats

Community moderators