Technology

Redeploying OKD 3.11 certificates

Since the beginning of 3.x line of OpenShift/OKD releases there are various issues with internal certificates. TLS communication inside the cluster is used in several places like router, registry, compute nodes, master nodes, etcd and so on. Unfortunately having hundreds of developers across the globe gives not exactly chaos but uncertainty and lack of confidence from the user perspective.

CSR should be automatically approved and they are not:

oc get csr -o name | xargs oc adm certificate approve

But in worst case scenario you also need to check validity of certificates. You can do this with ansible playbook. These can be obtained at https://github.com/openshift/openshift-ansible. You need to remember that should always check out the version you have deployed. Use tag or branch specific for the release. Avoid running playbooks from master as it will contain the latest one, which may be incompatible with yours.

To check validity run the following:

ansible-playbook openshift-checks/certificate_expiry/easy-mode.yaml

To redeploy certificates run this one:

ansible-playbook playbooks/redeploy-certificates.yml

In case it fails at outdated certificates or outdating soon (yes…) you need to set in /etc/ansible/host or any other file which you use as the inventory:

openshift_certificate_expiry_warning_days=7

And run check or redeploy once again. In case your certificate expires today or tommorow then use 0 as a value for this parameter. After redeploy, please use value 10000 to check if any certificate expires. There are few bugs here preventing you from redeploying or even properly checking certificates validity and no real one solution can be found. There might be one, but requires Red Hat subscription to access their closed access forum.

After redeploying and checking that is fine or at least a little better sometimes there are problem with having openshift-web-console up and running. Sometimes there is HTTP 502 error. The web-console works fine itself, but is unable to register its route in the HAProxy router. You can check this with:

oc get service webconsole -n openshift-web-console
curl -vk https://172.x.y.z/console/ # replace x, y and z with your webconsole IP

If you get valid response then you need to delete and recreate webconsole things manually. But first, try basic solutions as they may work for you:

oc scale --replicas=0 deployment.apps/webconsole
# wait around a minute
oc scale --replicas=1 deployment.apps/webconsole

If still got no webconsole:

oc delete secret webconsole-serving-cert
oc delete svc/webconsole
oc delete pod/webconsole-xxx # xxx is your pod ID

OKD should automatically recreate just deleted webconsole configuration. But in case it still fails, try to run complete playbook for webconsole recreation from scratch:

ansible-playbook openshift-web-console/config.yml

As for now, you should be able to get you webconsole back. I wonder if same low quality applies to OKD 4.x but for 3.x a number of problems and quirks is quite high, way higher than I would expected.