Collectd/InfluxDB/Grafana for pretty graphs, alerting based on metrics (value exceeded threshold)
Graylog for that premium log aggregation. Nginx can pipe data directly into it through syslog. Why stop there, redirect all system logs of all systems you have to it. You can then make pretty graphs and possibly even alerts based on pretty much anything you want.
I have some other misc stuff like SmokePing as well.
Using my free HetrixTools allotment to monitor the sites and servers I really care about. Some legacy stuff on Uptime Robot that I need to delete all the monitors for.
All my servers get added to a self-hosted ServerStatus monitor page. Most stuff I don’t actively monitor and set up alerts for, just the high priority stuff on HetrixTools. My page is here (don’t judge): https://masonr.cf
I had LibreNMS setup at one point - probably about a year ago - and stopped using it for some reason. Maybe I just didn’t feel like maintaining it anymore, idk. Am probably going to look into setting it up again and attach all my servers.
I restart my stuff fairly often, so 90% of reboots are from me. Never have been trying to max out my uptime . But anyways, VirMach is one of the more stable ones on there, DatabaseByDesign as well – they’ve always been solid.