Recently I had an issue that took me a couple of hours of troubleshooting, googling and various chats with ChatGPT, at the end I found the right GitHub issue and I got it figured out.
If you follow me on social you know that I often complain with cert-manager, the Kubernetes operator that helps you to manage and distribute TLS certificates in your cluster.
It is not its fault it is me I am sure. But I find its workflows very hard to debug because there is a lot going on. It can be the network, DNS, ingress, a lot of YAML that can easily break.
Very often those issues go away and it is hard to figure out why. But not this time! This time I know, and I want to remember.
The setup if the following:
- Kubernetes cluster obviously
- Nginx Ingress, the Kubernetes community one, not the Nginx Inc. one.
- Cert Manager
The Nginx Ingress serves an HTTP/2 GRPC server, but I don’t think it matters.
The certificate was ready $kubectl get certificates
showed it as Ready. The ingress was serving the right service and the service was routing to the right pods. The DNS pointed to the right ingress IP, the certificate was the default one signed by Kubernetes. Not the one I got from Let’s Encrypt.
Usually the failures are quite different, usually I can’t get the certificate because Let’s Encrypt can’t reach the ingress, or the ingress routes to 443 port automatically even if it does not yet have a certificate. This time the certificate was valid!
I tried the usual thing, rebuild the ingress, watch cert-manager controller logs, look at the Ingress (not this part is pretty complicated because it is not Nginx configuration, it is a LUA script, lovely), anyway it was all right.
TLDR: check for your Ingress rule to have a host set. If you don’t have one set, try to add it to the mix.
“ingress-nginx intermittently serves the default certificate instead of a configured tls certificate for rules without a host” is the issue that helped me out.
At the end it was a skill issue. James Strong (nginx ingress maintainer) was nice with me and said it looked like a doc issue to him, and I am gonna look at how to add this information to the official documentation soon.
Are you having trouble figuring out your way to building automation, release and troubleshoot your software? Let's get actionables lessons learned straight to you via email.