DISCLAIMER: This post is based on my very own and unique experience I went through during work in my lab. In the other words – your mileage may vary. Don’t treat it as an ultimate solution. If you have production cluster get in touch with Red Hat support before making any changes.
I have a three node, compact cluster (running masters only) virtualised on a single baremetal server. This is my lab so explosions are likely to happen and my configuration is not supported by Red Hat in any way.
I was performing OpenShift 4.12.19 to 4.13.1 upgrade but the process got stuck because one of the nodes couldn’t drain. This was because of disruption budged together with AntiAffinity rule didn’t let container go. Instead of finding which container it was I decided to go with a shortcut and rebooted the node. That was wrong 🙂
Node got rebooted but MachineConfigOperator was reporting master pool degraded:
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-9c7420d8aa28803bc87c59122fc855b1 False True True 3 2 2 1 79d
worker rendered-worker-2c8c19c25eed12594bf4117d11319867 True False False 0 0 0 0 79d
List of the nodes indicated one of them hasn’t been updated and still running old version of Kubernetes as bellow:
$ oc get nodes
NAME STATUS ROLES AGE VERSION
master-1 Ready control-plane,master,worker 6m27s v1.25.8+37a9a08
master-2 Ready control-plane,master,worker 8d v1.26.3+b404935
master-3 Ready control-plane,master,worker 22h v1.26.3+b404935
To troubleshoot the issue I switch to openshift-machine-config-operator
project and found the pod running machine-config-daemon on the affected node:
$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
machine-config-daemon-2nf25 2/2 Running 0 8d 192.168.232.124 master-2 <none> <none>
machine-config-daemon-6lc6x 2/2 Running 0 8m 192.168.232.123 master-1 <none> <none>
machine-config-daemon-stsnj 2/2 Running 0 22h 192.168.232.122 master-3 <none> <none>
Checking its log showed precisely what went wrong. Reboot of the node caused desync in what MCO expects on the node (it expects the node to run already updated image version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d2aa8899d6ec5cd40bbe7b843027148b768f0a5b8ab091aa46958c4893814306
) and what it really finds there (the node image was not really updated and it still runs the old version – quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:df4c3b1ad3c665bc4d7a73d78014645a63ee4518cbd515efa8bee68a83444738
).
E0712 14:36:56.237472 3378 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-master-280af3b80aac4ca3a83b3107bdefe409: expected target osImageURL "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d2aa8899d6ec5cd40bbe7b843027148b768f0a5b8ab091aa46958c4893814306", have "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:df4c3b1ad3c665bc4d7a73d78014645a63ee4518cbd515efa8bee68a83444738" ("85a1a0c0a7be436c69f743cd2d9538f5fde69ce63eb810ffe3bd9abe122aa5ff")
Now I somehow need to encourage MCO to perform the upgrade once again. I found few examples how to do it but none of them was working for me and I was always ending up with degraded node, because of this unexpected on-disk state.
Here is what I found working:
Find rendered master MachineConfig which refers to osImageURL which is currently being used on the affected node, for an instance:
$ oc project openshift-machine-config-operator
Using project "openshift-machine-config-operator" on server "https://api.ocp4.example.com:6443".
$ oc get mc | awk '$0 ~ /rendered-master/ {print $1}' | while read MC; do oc get mc ${MC} -o yaml > ${MC}.yaml; done
$ ls rendered-master-*
rendered-master-280af3b80aac4ca3a83b3107bdefe409.yaml rendered-master-9c7420d8aa28803bc87c59122fc855b1.yaml
rendered-master-34cb6b8b7309d8a36043c198f3349034.yaml rendered-master-d02ab2bac47f31a7d32b64ab43af8c8b.yaml
rendered-master-38a19ea84a27cc9a437da101a8e61fd2.yaml rendered-master-d0a726600ac86d0e933e5d41ec1d1ace.yaml
rendered-master-4f43c4fd6281684dbf2920305f5df0a4.yaml
$ grep df4c3b1ad3c665bc4d7a73d78014645a63ee4518cbd515efa8bee68a83444738 rendered-master-*
rendered-master-38a19ea84a27cc9a437da101a8e61fd2.yaml: osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:df4c3b1ad3c665bc4d7a73d78014645a63ee4518cbd515efa8bee68a83444738
So I know the last (and only one) rendered-master MachineConfig is rendered-master-38a19ea84a27cc9a437da101a8e61fd2.
Go to the affected node and delete /etc/machine-config-daemon/currentconfig
file:
$ oc debug node/master-1
Starting pod/master-1-debug ...
To use host binaries, run `chroot /host`
chroot /host
Pod IP: 192.168.1.10
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# rm /etc/machine-config-daemon/currentconfig
Edit node’s annotations and set the following metadata.annotations as following:
machineconfiguration.openshift.io/currentConfig: rendered-master-38a19ea84a27cc9a437da101a8e61fd2
machineconfiguration.openshift.io/desiredConfig: rendered-master-9c7420d8aa28803bc87c59122fc855b1
machineconfiguration.openshift.io/reason: ""
machineconfiguration.openshift.io/ssh: accessed
machineconfiguration.openshift.io/state: Done
machineconfiguration.openshift.io/currentConfig
– has to be set to MachineConfig found in the previous step (the last one which has existingosImageURL
being used on the affected node).machineconfiguration.openshift.io/desiredConfig
– most likely doesn’t have to be changed as it points to the MachineConfig which contains the new Image version to be installed on the nodemachineconfiguration.openshift.io/reason
– make it an empty stringmachineconfiguration.openshift.io/ssh
– set it to accessed if it isn’t alreadymachineconfiguration.openshift.io/state
– set it toDone
Get back to the node and touch /run/machine-config-daemon-force
file so MachineConfigDaemon will re-attempt node upgrade:
sh-5.1# touch /run/machine-config-daemon-force
At this stage MachineConfigDaemon should restart node upgrade, deploy new image and reboot the node. You can observe it in logs of the relevant machine-config-daemon pod or directly on the node
sh-5.1# journalctl -fl
Jul 13 08:16:43 master-1 root[28206]: machine-config-daemon[6691]: Skipping on-disk validation; /run/machine-config-daemon-force present
Jul 13 08:16:43 master-1 root[28207]: machine-config-daemon[6691]: Starting update from rendered-master-38a19ea84a27cc9a437da101a8e61fd2 to rendered-master-9c7420d8aa28803bc87c59122fc855b1: &{osUpdate:true kargs:true fips:false passwd:false files:true units:true kernelType:false extensions:false}
Jul 13 08:16:43 master-1 root[28208]: machine-config-daemon[6691]: drain is already completed on this node
(...)
Jul 13 08:17:23 master-1 root[29671]: machine-config-daemon[6691]: Rebooting node
Jul 13 08:17:23 master-1 root[29672]: machine-config-daemon[6691]: initiating reboot: Node will reboot into config rendered-master-9c7420d8aa28803bc87c59122fc855b1
Jul 13 08:17:23 master-1 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-9c7420d8aa28803bc87c59122fc855b1.
Jul 13 08:17:23 master-1 root[29675]: machine-config-daemon[6691]: reboot successful
Jul 13 08:17:23 master-1 systemd-logind[1197]: System is rebooting.
If you’re lucky you should see updated node shortly back to the cluster:
NAME STATUS ROLES AGE VERSION
master-1 Ready control-plane,master,worker 10m v1.26.3+b404935
master-2 Ready control-plane,master,worker 8d v1.26.3+b404935
master-3 Ready control-plane,master,worker 22h v1.26.3+b404935
If you’re unlucky and node still reports disk inconsistency you may be a victim of race-condition between you and machine-config-daemon. This isn’t fully confirmed nor proven but I am aware about the case where machine-config-daemon was reverting changes in node’s annotations after they were edited and before node was rebooted. For that reason I recommend to give it a try and run two sessions: one with editor, the other one with shell on the affected node, to ensure once node annotations are being updated and saved, reboot is being triggered quickly enough to do not give machine-config-daemon of reverting node’s annotations. I will document it further once I face similar case again.