The words "stupid simple" and "C++" together make me scratch my head though. C++ itself is not simple, and you have to recompile if you need to change something (and sometimes you inevitably do), which is slow. I'd likely go with a relatively simple C program that embeds the FFI for RRDtool and other stuff, and embeds Lua, or, better yet, Janet. Then most of the thing could be written in these languages, and would be easy to tweak when need be. That would still allow for a single executable + a single config file, on top of the logic already embedded. (But the author went and built the thing, and I did not, so their solution so far outperforms mine.)
Github: https://github.com/dobin/dmsr
Live: https://mon.yookiterm.ch
I ended up hacking together a shell script to send data to Home Assistant (via MQTT) which runs on pretty much any system that has at least netcat: https://github.com/roger-/hass-sysmon
Some pushback:
- SNMP sucks. It's very limited, difficult to secure, etc. I've spent a lot of time with it, and it's more complex than Prometheus' simple HTTP metrics model. I use it where I have to (non-server devices), but I prefer dealing with Prometheus.
- Grafana is not necessarily complex. It's powerful, and you can waste a lot of time overinvesting in dashboard design, but that's not required. It can be used quite elegantly.
μMon does seem like "old school for the sake of old school". SNMP and RRDTool were designed when memory & bandwidth were much more limited. I will happily trade the overheads of HTTP and static Go binaries for the much superior UX they offer.
Yet, I run some hobby projects that collect data and this setup is absolutely perfect for it. I even challenged myself to use SSL for the InfluxDB server (run small CA).
Also, I use slack-based alerting through Grafana, for example if a disk would fill up, or something is down.
So it’s really about what your needs are.
And often, basic metrics about systems like CPU usage, load or network traffic doesn’t tell you anything useful or actionable.
Still need RRD viewere but that's not a huge stack
And it scales all the way to hundreds of hosts, as on top of network send/receive of stats it supports few other write formats aside from just RRD files.
I did something different but in a similar vein for one server network. We had Seq already deployed for log monitoring so instead of setting up a separate network/node/app health monitoring interface I configured everything to regularly ping seq with a structured log message containing the needed data that could be extracted and graphed with seq’s limited oob charting abilities in the dashboard. Not perfect, but simpler.
But agree with OP that Prometheus feels more complex than need be for simple use cases. But so does sendmail ;)
I often think about the "reinventing the wheel" argument. Isn't open source about diversity? There are so many fork, clones, "Yet another..."'s (yacc, yaml,...).
So many times I'm looking for suitable go libraries that solve a certain problem. There might be a few out there but every lib has its own pros and cons. Having the possibility to choose is great. Nothing sucks more than depending on a unmaintained clib nobody cares without alternatives.
The only counter-example that comes in my mind is crypto. You don't want to do your own crypto.
If I have to deploy this on each machine than it makes no sense. I know SNMP is able to be used like this, but is μMon ?
Anyway, I might still deploy this in a Proxmox homelab where I don't want to fight with the complexity of a grafana dashboard.
Very simple logging, if not structured, while not completely useless it's not very useful either. except for maybe showing some nice charts.
Any serious monitoring tool is useful when it can explain things and only tracing gives you causal information.
> A full-blown time-series database (with gigabytes of rolling on-disk data).
Prometheus has a setting that allows you to limit the space used by the database. I'm not sure however how one can do monitoring without a time-series database.
> Several Go binaries dozens of megabytes each, also consuming runtime resources.
Compared to most monitoring tools I've tested, the Prometheus exporters are usually fairly lightweight in relation to the amount of metrics they generate. Also, "several dozens of megabytes" doesn't seem like too much when we're usually talking about disk spaces in the gigabytes...
> Lengthy configuration files and lengthy argument lists to said binaries.
Configuration files, yes if you want to change all the defaults. Argument lists, not really. In reality, a Docker deployment of Grafana + Prometheus is 20 lines in a docker-compose.yml file. Configuration files come with defaults if you install it to the system.
By the way, I'm not sure that configuring a FastCGI server will be easier than configuring a Docker compose file...
> Systems continuously talking to each other over the network (even when nobody is looking at any dashboard), periodically pulling metrics from nodes into Prometheus, which in turn runs all sorts of consolidation routines on that data. A constant source of noise in otherwise idling systems.
Not necessarily. Systems talk to each other over the network if you configure them to do so. You can always install a Prometheus + Grafana on every node if you don't want to do central monitoring and you'll have no network noise.
> A mind-boggingly complex web front-end (Grafana) with its own database, tons of JavaScript running in my browser, and role-based access control over multiple users.
Grafana, complex? I think dragging and dropping panels with query builders that don't even require you to know the query language are far better than defining graphs in shell scripts.
> A bespoke query language to pull metrics into dashboards, and lots of specialized knowledge in how to build useful dashboards. It is all meant to be intuitive, but man, is it complicated!
Again, this is not a problem of the stack. Building useful dashboards is complicated no matter what tool you use.
> maintenance: ongoing upgrades & migrations
Not really. Both Prometheus and Grafana are usually very stable and you don't need to upgrade if you don't want to. I have a monitoring stack built with it in my homelab and I haven't updated it in two years, and it still works. Of course I don't have the new shiny features, but it works.
To me, it seems that the author is conflating the complexity of the tool with the complexity of monitoring itself. Yes, monitoring is hard. Knowing which metrics to show, which to pull, how to retain them, it's hard. Knowing how to present those metrics to users is also hard. But this tool doesn't solve that. In the end, I don't know how useful it is to make a custom tool that collects very limited metrics based on other ancient, limited, buggy tools (SNMP, RRD, FastCGI...) that is missing even basic UX features like being able to zoom or pan on charts.
But looking at the installation instructions[1], I can't help but think that their reluctance to use Docker feels contrarian for no reason (and the quip about it being "out of fashion" completely misguided). This whole procedure could be automated in a Dockerfile, and actually running uMon would be vastly simplified. Docker itself is not much more than a wrapper around Linux primitives, and if they dislike it specifically for e.g. having to run a server and run containers as root, there are plenty of other lighterweight container alternatives.
There's an argument to be made that the "Simple" Network Management Protocol they're a fan of is far from being simple either[2]. Configuring the security features of v3 is not a simple task, and entire books have been written about SNMP as well. They conveniently ignore this by using v2c and making access public, which might not be acceptable in real-world deployments.
I'm all for choosing "simple" tools and stacks over "complex" ones, for whatever definition of those terms one chooses to use, and I strive to do that in my own projects whenever possible, but simplicity is not an inherent property of old and battle-tested technologies. We should be careful to not be biased for technology we happen to be familiar with, but be pragmatic about picking the right tool for the job that fits our requirements, regardless of its age or familiarity.
[1]: https://tomscii.sig7.se/umon/#Installation%20and%20getting%2...
[2]: I have a pet peeve about tools or protocols with "simple" or "trivial" in their name. They almost always end up being the opposite of that as they mature, and the name becomes an alluring mirage tricking you into its abyss of hidden complexity. I'm looking at you SMTP, TFTP...