As I was browsing lemmy and the fediverse at large, this question kept popping into my head.

Since multimedia files have a much bigger footprint than raw text, it made me feel worried since as time goes, massive resources will be needed to keep up with the big data coming in.

I do wonder if the instances have taken the route of the cloud and just decided to put all of it in something like AWS S3? Or maybe they use self hosted storage with something like minio for object storage?

You are viewing a single thread.
View all comments
14 points
*

Edit: I am partially wrong. (See below)

They’re stored on their host Instance. Only text is copied across instances.

permalink
report
reply
20 points
*

That is not true. As long as a user on your instance is subscribed to a community, the media content of posts [Edit: only posts linking to outside sources, e.g imgur] of that community is stored locally on your instance as well.

This, of course, only applies to media which is uploaded to Lemmy, links to media hosted externally are not downloaded.

See this issue for more context.

Edit: I want to clarify that I was partially wrong - Lemmy only locally caches content which is hosted on outside sites. It does (should?) not cache content that was directly uploaded to a Lemmy instance and just embeds the source media.

permalink
report
parent
reply
26 points

I think this could be a ticking DOS time bomb.

Someone manages to spam upload massive files to the largest Lemmy instances could wipe out a ton of smaller ones.

Not to mention scalability wise this seems like a nightmare… eventually the largest Lemmy instances will have petabytes of media data with 100s of gbs coming in per day, giving other instances no chance to sync with them.

I think the system architecture needs a significant review. This won’t scale.

permalink
report
parent
reply
5 points

I agree. It’s also a tremendous waste of resources. I’m all for redundancy (like CDNs), but this seems incredibly poorly thought out. If Lemmy (as a whole) every scales to the size of other social media, the space requirements will start to become unreasonable.

Why wouldn’t something like symlinks be implemented? Not saying specifically use symlinks, but there has to be a similar, better way.

permalink
report
parent
reply
1 point

I mentioned something akin to this possibility a couple days ago, but was told this likely wasn’t the case. I’ll have to see if I can dig up the argument for that.

permalink
report
parent
reply
3 points

Agree. If I’m not mistaken, you can only disable the caching of sensitive (NSFW) content on your instance by disabling NSFW in general. This doesn’t go for SFW content though.

It shouldn’t be very hard to do this for all content though, if I find the time I might look into implementing this.

permalink
report
parent
reply
3 points

I feel like the developers should spend some time adding features to reduce malicious activity. They could provide settings to the admins to limit the number of things one user can do in a day, like number of images, total size of images, number of communities created, etc. Sure, someone could create multiple accounts, but it would still make it harder to attack Lemmy.

permalink
report
parent
reply
3 points

This actually brings up another question for me! Say your account is on an instance that doesn’t allow something, like nudity. If you subscribe to a community on another instance that DOES allow it, you’re saying that everything you see there does end up (redundantly) hosted by your home instance. Has the Lemmy moderation/admin community in general decided on whether or not that’s breaking the home instance’s rules?

permalink
report
parent
reply
2 points

Right now you can only disable caching of nsfw content by disabling NSFW for the instance, but of course this has nothing to do with “soft” rules that are only written out in text.

Imo the best solution would be to allow admins to have more granular control over caching, e.g. disabling caching for specific instances / communities or whitelisting. And we need an option to disable caching altogether.

permalink
report
parent
reply

What??? Why? I’ll have to shut my instance down if that’s true.

permalink
report
parent
reply
1 point

I updated my comment as I was partially wrong about what gets cached.

permalink
report
parent
reply
2 points

That sounds good for reliability since an instance can still lookup posts even if another fails.

For videos and images, do they store them as blobs in the database or do they use something more catered to files like object storage or maybe a regular filesystem with metadata on a database?

permalink
report
parent
reply
4 points

They use pict-rs which looks like it was file system, but is now object storage

permalink
report
parent
reply
2 points

That’s an awesome name. The Rust community never fails to deliver lol

permalink
report
parent
reply

Asklemmy

!asklemmy@lemmy.ml

Create post

A loosely moderated place to ask open-ended questions

Search asklemmy 🔍

If your post meets the following criteria, it’s welcome here!

  1. Open-ended question
  2. Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
  3. Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
  4. Not ad nauseam inducing: please make sure it is a question that would be new to most members
  5. An actual topic of discussion

Looking for support?

Looking for a community?

Icon by @Double_A@discuss.tchncs.de

Community stats

  • 10K

    Monthly active users

  • 5.9K

    Posts

  • 319K

    Comments