Unsurprisingly, some folks on raddle and reddit seem to have a big problem with lemmy. A lot of it is pure FUD.
However, this appears to be a valid security concern:
https://raddle.me/f/fediverse/166674/lemmy-is-so-much-like-email-it-even-brought-back-spy-tracker
Any thoughts on how fixable this is?
Of course the general consensus on reddit is “lemmy devs are clueless and dangerous”. I’m pretty sure a lot of it is one guy with multiple alt accounts, tho. He has a Joe McCarthy attitude about lemmy because of one of the primary devs.
Any thoughts on how fixable this is?
This shouldn’t be hard to fix. Lemmy needs to proxy images, there’s an open issue for this. Right now, I don’t use Lemmy outside of Tor Browser specifically because of issues like this, and the recent XSS vulnerability is making me even more concerned. Lemmy is a great project, but it needs work and probably a security audit.
Why are people pretending this isn’t an issue??? Of course it is lol.
Luckily the fix is also easy: an image proxy server. Mail clients do this already.
It exposes the bigger problem with Lemmy: lack of auditing.
Raddle user learns how the Internet works 🤯
In all seriousness though, although this is a concern, in Email in particular the solution most choose is to just disable images, so it isn’t really a sincere comparison IMO.
We could maybe mitigate this with…
- Proxying & caching - Instance would cache a copy of the commented image and serve it from there, blocking the IP of the user from being exposed. This could introduce some additional latency and fill up server storage faster
- CSP Header & Local caching - Client could block the name of the instance from being transferred, and also cache a copy of the image locally. This doesn’t protect the user’s IP address in any way, but would hinder the ability to count how many times a particular IP has viewed/accessed a post
- Shared Lemmy image proxies - Image requests are proxied through a randomly selected Lemmy image proxy. This would ‘hide’ the origin IP to all but the volunteer proxy provider. I’d personally be willing to host a few of these if this ever became a thing
Maybe Privacy Badger can get on this. I believe they block trackers like facebook by replacing widgets and other stuff that are embedded on pages. Not sure how they can do that for individual unknown trackers though.
Can someone with more knowledge on the lemmy protocol/api bring some light into this? The way the linked posted is written, it seems like some random angry guy just hates lemmy for whatever reason.
To me it seems like a complete bs argument. As far as I can tell this tactic is possible with every service where users can provide content. Of course I can link to a site that reads users data. There’s basically no preventing this unless the (lemmy) clients provide their own modified browser that masks the users IP and other metadata.
This is a valid privacy issue, and other fediverse projects like Mastodon already solve this. The problem is that by embedding an image, you can tell the client to make a network request to your server, revealing information such as your IP address and browser. The solution is to proxy media through your instance, which is presumably trusted. this hides your IP address and browser information. And as someone else mentioned here, a Content-Security-Policy can be used to ensure this attack isn’t possible in a browser.
You actually can prevent this easily with CSP (content security policy). That header tells your browser which adresses it is allowed to load additional data from when visiting your site. It is an important tool to prevent cross-site scripting attacks, your browser should not load data from random sources when it is on your site.
Of course you would have to funnel all inline images through a site-local proxy that the browser is allowed to load data from.
This also has not only security implications, but also with the GDPR. Some jurisdiction consider ip addresses as personal data. Sending them to e.g. the US without user consent would be a violation. I know it is stupid to consider ip addresses as personal data and it is stupid to consider a browser loading data as sending that personal data somewhere on the sites’ behalf. But there is a reason why a lot of websites for example only embed tweets after you explicitely allow it.
I think when you link images off-site on Reddit, Reddit still caches a preview for it and serves that to the user, the user will actually have to click a link to go off the platform into the unknown. If we do embeds and such here they’re loaded from off site directly without user interaction.
Ergo your browser makes a request to a random potentially dangerous server, and there isn’t much the average user can do to prevent that.