If I were to create my own instance federated with all the other instances, as of today, how much data would I be storing, since I would make a copy of all the content?

I know this will vary a lot, but I’m looking for a ballpark figure to have an idea. I don’t think it would be a lot, but I can’t find an estimate anywhere.

Reposted from https://lemmy.world/post/55030 as I think this community is probably a better fit

15 points

Update: I’ve just seen this:

I run lemmy.world on a VPS at Hetzner. They are cheap and good. Storage: I now (after 11 days) have 2GB of images and 2GB of database. (https://lemmy.world/comment/65982)

permalink
report
reply
8 points
*

Our smaller instance that has been federating for a bit more than a year now (started in March 2022) is now at 2.4gb for the database and 7gb for the image storage (which probably needs some clean-up from previous image spam waves).

permalink
report
reply
3 points

@poVoq @ndr this will surely grow a lot, if you look at other activity pub compatible systems you’ll see a huge grow, it depends on the retention of “old” post and media, if you say just store all for a year you might keep it smaller, but if you want to replace #Reddit or so it would be better to keep stuff a bit longer, but then on the other hand the #Fediverse is probable not meant to store stuff for long term.

On my Friendica node I have a rather short period to store foreign posts and media, and my storage is only about 47 GB, most of the media is stored in the database as well (easier and faster to backup, much slower to retrieve) and it is a single user instance with just a handful of bots besides the account I write from.

permalink
report
parent
reply
0 points

Keeping more of the history is probably a good thing if we want to replace Reddit. Think of all the homelab/server posts you’ve used that are over a year old. Good info can last a while.

permalink
report
parent
reply
1 point

@sedawk probably… maybe threads can be closed and then some data can be removed.

This is a culprit of the Fediverse players, they’re unlikely to keep all data forever (as some services do, at least until they’re gone for good like Google Waves). Storage costs money, the need will grow forever, maybe some more cost effective storage can be used for old data/posts/threads/media, just like internet archive does, they don’t used fast storage, so it takes seconds to load old website versions. But that also seems like a big leap for amateur technology enthusiasts (like most admins of Fediverse systems are).

permalink
report
parent
reply
7 points
*

****EDIT: ****

  • A whopping 29MB of database
  • Container Vols 695MB.

looked at my instance last night that I’m only subbing to other communities. It’s been running 2.5 days at that point. My VPS with Ubuntu 22.04 is at 5gb total. Next time I ssh llI look at the database size. I think I can confirm pictures are still loading from source instance (lemmy.ml went down for a brief second last night and all my images with them stopped loading).

permalink
report
reply
7 points

it’s my understanding that you won’t make a full copy of everything. you’ll only copy the communities that are added and you won’t be copying the full history.

I’m unsure how far back it goes but it might be just the one day. i saw talk of changing it to go back as far as someone actually scrolled and pressed next but that was just an idea at this point.

permalink
report
reply
7 points

Also it doesn’t copy the images as Mastodon does, only the text objects.

permalink
report
parent
reply
5 points

For this reason, I assume it should be feasible to self-host an instance that sync with most communities.

For reference, Wikipedia (en) takes up just over 20 GB (https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia), so I assume much less than that for Lemmy (with the current userbase).

permalink
report
parent
reply
3 points

oh right, that’s a big one there. that images stay on the origin instance and they linked from there

permalink
report
parent
reply
1 point

I actually kind of like how Akkoma does this. It creates a constant size proxy with nginx, and all the images come from predetermined host instead of all over the net. It’s a good mixture of not using tens of gigabytes of space and still spreading the load a bit between the instances.

permalink
report
parent
reply
2 points

Okay, I see, thanks.

But for example, do we know how much data lemmy.ml stores?

permalink
report
parent
reply
6 points
*
Deleted by creator
permalink
report
reply
5 points

Oh yeah, I’m still learning and planning when/if I do it.

OTOH, we shouldn’t scare people off doing it either LOL

permalink
report
parent
reply
2 points
*
Deleted by creator
permalink
report
parent
reply

Self Hosted - Self-hosting your services.

!selfhost@lemmy.ml

Create post

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don’t control.

Rules

  • No harassment
  • crossposts from c/Open Source & c/docker & related may be allowed, depending on context
  • Video Promoting is allowed if is within the topic.
  • No spamming.
  • Stay friendly.
  • Follow the lemmy.ml instance rules.
  • Tag your post. (Read under)

Important

Beginning of January 1st 2024 this rule WILL be enforced. Posts that are not tagged will be warned and if not fixed within 24h then removed!

  • Lemmy doesn’t have tags yet, so mark it with [Question], [Help], [Project], [Other], [Promoting] or other you may think is appropriate.

Cross-posting

If you see a rule-breaker please DM the mods!

Community stats

  • 128

    Monthly active users

  • 368

    Posts

  • 2.5K

    Comments