ops

When I get the question: “is the API down?” I know that I am doing something wrong.

Not because services should be always up, I do not run hospitals or similar, but questions like that bring confidence down drastically for both the person who makes the question and for me.

We all know “the right answer”, o11y, monitoring, logs, Prometheus, metrics, alerts.

I here to tell you that those things are crucial, but they can be expensive in many dimensions: money, operationally, complex.

We should all eventually get there but I want to build confidence about what I do since day one and for me Uptime Kuma is the way to go.

Do you remember PingDOM the good old days? You configure your healthcheck that can be as easy as an HTTP endpoint, you check the returned status code, the TLS certificates and if something does not look correct you get an alert.

Personally I set it up with NixOS in my homelab for a few services I run on my own and also for a few things at work because you never know.

  services.uptime-kuma = {
    enable = true;
    settings = {
      UPTIME_KUMA_PORT = "4100";
      UPTIME_KUMA_HOST = "127.0.0.1";
    };
  };

With nginx as a reverse proxy and for my company I deployed it via helm chart on Kubernetes. Pretty straightforward. I applied it via Terraform because as you know is my way to go for environments where I am not forced to go with something different like Flux or similar.

resource "helm_release" "uptime-kuma" {
  provider = helm.development
  name     = "uptime-kuma"

  repository = "https://dirsigler.github.io/uptime-kuma-helm"
  chart      = "uptime-kuma"
  version    = "2.19.3"

  create_namespace = true
  namespace        = "monitoring"
}

Since at home I run home assistant I decided to use that as gateway for my notification, since I have the companion mobile app installed I get a notification straight to my phone when something does not look correct. Since it runs in my home network I added many other devices that I want to be sure are up and running like Shelly, home assistant itself.

At work, we use Slack, so it sends notification there.

Conclusion

This is again to say that “Perfect is the enemy of good” and very often perfection in this context means a complicated set of alerts, monitoring tools, queries that can be for sure a lot better, but that require time that I don’t always want to spend doing that. This tool proved to be easy enough to just be good to have around. Because I do, and I did, and I will get to something more sophisticated but at the end I want those services to response, and I want to know when they don’t as early as possible, and even when you reached your dream monitoring setup you will make that curl request anyway.