I’m looking into setting up some monitoring combined with simple automation for my selfhosting. Currently I was thinking about using Zabbix.
I want to:
Track bandwidth usage on a router/fw and on a managed switch and track cpu/ram/disk usage on my vms.
Simple monitoring (up/down/maintenance) on the router, switch, my vms as well as on linux services (jellyfin/forgejo/etc) and windows services (lab for studying work-related tools).
I’m also interested in doing simple https checks on my webuis (i’ve had a service running but the website returning both 403 and 404 before) and testing nslookup on my internal dns (if the service is up but the lookups timeout I still want to try restarting the service).
Is there any FOSS/FLOSS alternatives that I should look into before diving into Zabbix?
Prometheus/VictoriaMetrics/Grafana are pretty good, had no issues with it and there’s an exporter for damn near anything. They’re pretty easy to custom write too.
But these 3 are all about metrics, right? While they’re great to monitor and analyse numbers (ping times, disk space, memory, etc.), they aren’t that great with e.g. plaintext error messages in log files. That’s how I remember it from a few years ago, at least.
Cheers! I’ve heard of Prometheus/Grafana but VictoriaMetrics was a new one. Gonna look into it!
Yeah VictoriaMetrics is the new favorite since Influx keeps reinventing their wheels and trying to move everyone to the cloud.
I’ve been using Zabbix for ages now. It has issues but I got used to it.
Uptime Kuma is great for simple up/down and web checks. Librenms is worth looking at too for other metrics.
I’ve used Zabbix for that before. I hope you like SNMP, though!
I’m using CheckMk for pretty much all of that. Personally I found zabbix to have too much overhead.
For me it’s the other way around. In Check_MK I was constantly writing new custom checks and it was all manual code and overall felt like Nagios on steroids (what it was back then) - just not in a good way.
In Zabbix you can do everything in the UI without messing around in the file system. And things like translating SNMP results to readable text works throughout the system without having to include a Python file and then call it from within your various other checks. All the alerting logic can be clicked together and easily amended in the UI. It’s so much more comfortable once you’ve figured it out.