OpenShift upgrade got stuck – unexpected on-disk state – target osImageURL – recovery


DISCLAIMER: This post is based on my very own and unique experience I went through during work in my lab. In the other words – your mileage may vary. Don’t treat it as an ultimate solution. If you have production cluster get in touch with Red Hat support before making any changes.

I have a three node, compact cluster (running masters only) virtualised on a single baremetal server. This is my lab so explosions are likely to happen and my configuration is not supported by Red Hat in any way.

I was performing OpenShift 4.12.19 to 4.13.1 upgrade but the process got stuck because one of the nodes couldn’t drain. This was because of disruption budged together with AntiAffinity rule didn’t let container go. Instead of finding which container it was I decided to go with a shortcut and rebooted the node. That was wrong 🙂

Node got rebooted but MachineConfigOperator was reporting master pool degraded:

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-9c7420d8aa28803bc87c59122fc855b1   False     True       True      3              2                   2                     1                      79d
worker   rendered-worker-2c8c19c25eed12594bf4117d11319867   True      False      False      0              0                   0                     0                      79d

List of the nodes indicated one of them hasn’t been updated and still running old version of Kubernetes as bellow:

$ oc get nodes
NAME       STATUS     ROLES                         AGE     VERSION
master-1   Ready      control-plane,master,worker   6m27s   v1.25.8+37a9a08
master-2   Ready      control-plane,master,worker   8d      v1.26.3+b404935
master-3   Ready      control-plane,master,worker   22h     v1.26.3+b404935

To troubleshoot the issue I switch to openshift-machine-config-operator project and found the pod running machine-config-daemon on the affected node:

$ oc get pods -o wide
NAME                                        READY   STATUS    RESTARTS   AGE    IP                NODE       NOMINATED NODE   READINESS GATES
machine-config-daemon-2nf25                 2/2     Running   0          8d     192.168.232.124   master-2   <none>           <none>
machine-config-daemon-6lc6x                 2/2     Running   0          8m     192.168.232.123   master-1   <none>           <none>
machine-config-daemon-stsnj                 2/2     Running   0          22h    192.168.232.122   master-3   <none>           <none>

Checking its log showed precisely what went wrong. Reboot of the node caused desync in what MCO expects on the node (it expects the node to run already updated image version: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d2aa8899d6ec5cd40bbe7b843027148b768f0a5b8ab091aa46958c4893814306) and what it really finds there (the node image was not really updated and it still runs the old version – quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:df4c3b1ad3c665bc4d7a73d78014645a63ee4518cbd515efa8bee68a83444738).

E0712 14:36:56.237472    3378 writer.go:200] Marking Degraded due to: unexpected on-disk state validating against rendered-master-280af3b80aac4ca3a83b3107bdefe409: expected target osImageURL "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d2aa8899d6ec5cd40bbe7b843027148b768f0a5b8ab091aa46958c4893814306", have "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:df4c3b1ad3c665bc4d7a73d78014645a63ee4518cbd515efa8bee68a83444738" ("85a1a0c0a7be436c69f743cd2d9538f5fde69ce63eb810ffe3bd9abe122aa5ff")

Now I somehow need to encourage MCO to perform the upgrade once again. I found few examples how to do it but none of them was working for me and I was always ending up with degraded node, because of this unexpected on-disk state.

Here is what I found working:

Find rendered master MachineConfig which refers to osImageURL which is currently being used on the affected node, for an instance:

$ oc project openshift-machine-config-operator
Using project "openshift-machine-config-operator" on server "https://api.ocp4.example.com:6443".
$ oc get mc | awk '$0 ~ /rendered-master/ {print $1}' | while read MC; do oc get mc ${MC} -o yaml > ${MC}.yaml; done
$ ls rendered-master-*
rendered-master-280af3b80aac4ca3a83b3107bdefe409.yaml	rendered-master-9c7420d8aa28803bc87c59122fc855b1.yaml
rendered-master-34cb6b8b7309d8a36043c198f3349034.yaml	rendered-master-d02ab2bac47f31a7d32b64ab43af8c8b.yaml
rendered-master-38a19ea84a27cc9a437da101a8e61fd2.yaml	rendered-master-d0a726600ac86d0e933e5d41ec1d1ace.yaml
rendered-master-4f43c4fd6281684dbf2920305f5df0a4.yaml
$ grep df4c3b1ad3c665bc4d7a73d78014645a63ee4518cbd515efa8bee68a83444738 rendered-master-*
rendered-master-38a19ea84a27cc9a437da101a8e61fd2.yaml:  osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:df4c3b1ad3c665bc4d7a73d78014645a63ee4518cbd515efa8bee68a83444738

So I know the last (and only one) rendered-master MachineConfig is rendered-master-38a19ea84a27cc9a437da101a8e61fd2.

Go to the affected node and delete /etc/machine-config-daemon/currentconfig file:

$ oc debug node/master-1
Starting pod/master-1-debug ...
To use host binaries, run `chroot /host`
chroot /host
Pod IP: 192.168.1.10
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# rm /etc/machine-config-daemon/currentconfig

Edit node’s annotations and set the following metadata.annotations as following:

    machineconfiguration.openshift.io/currentConfig: rendered-master-38a19ea84a27cc9a437da101a8e61fd2
    machineconfiguration.openshift.io/desiredConfig: rendered-master-9c7420d8aa28803bc87c59122fc855b1
    machineconfiguration.openshift.io/reason: ""
    machineconfiguration.openshift.io/ssh: accessed
    machineconfiguration.openshift.io/state: Done
  • machineconfiguration.openshift.io/currentConfig – has to be set to MachineConfig found in the previous step (the last one which has existing osImageURL being used on the affected node).
  • machineconfiguration.openshift.io/desiredConfig – most likely doesn’t have to be changed as it points to the MachineConfig which contains the new Image version to be installed on the node
  • machineconfiguration.openshift.io/reason – make it an empty string
  • machineconfiguration.openshift.io/ssh – set it to accessed if it isn’t already
  • machineconfiguration.openshift.io/state – set it to Done

Get back to the node and touch /run/machine-config-daemon-force file so MachineConfigDaemon will re-attempt node upgrade:

sh-5.1# touch /run/machine-config-daemon-force

At this stage MachineConfigDaemon should restart node upgrade, deploy new image and reboot the node. You can observe it in logs of the relevant machine-config-daemon pod or directly on the node

sh-5.1# journalctl -fl
Jul 13 08:16:43 master-1 root[28206]: machine-config-daemon[6691]: Skipping on-disk validation; /run/machine-config-daemon-force present
Jul 13 08:16:43 master-1 root[28207]: machine-config-daemon[6691]: Starting update from rendered-master-38a19ea84a27cc9a437da101a8e61fd2 to rendered-master-9c7420d8aa28803bc87c59122fc855b1: &{osUpdate:true kargs:true fips:false passwd:false files:true units:true kernelType:false extensions:false}
Jul 13 08:16:43 master-1 root[28208]: machine-config-daemon[6691]: drain is already completed on this node
(...)
Jul 13 08:17:23 master-1 root[29671]: machine-config-daemon[6691]: Rebooting node
Jul 13 08:17:23 master-1 root[29672]: machine-config-daemon[6691]: initiating reboot: Node will reboot into config rendered-master-9c7420d8aa28803bc87c59122fc855b1
Jul 13 08:17:23 master-1 systemd[1]: Started machine-config-daemon: Node will reboot into config rendered-master-9c7420d8aa28803bc87c59122fc855b1.
Jul 13 08:17:23 master-1 root[29675]: machine-config-daemon[6691]: reboot successful
Jul 13 08:17:23 master-1 systemd-logind[1197]: System is rebooting.

If you’re lucky you should see updated node shortly back to the cluster:

NAME       STATUS   ROLES                         AGE     VERSION
master-1   Ready    control-plane,master,worker   10m     v1.26.3+b404935
master-2   Ready    control-plane,master,worker   8d      v1.26.3+b404935
master-3   Ready    control-plane,master,worker   22h     v1.26.3+b404935

If you’re unlucky and node still reports disk inconsistency you may be a victim of race-condition between you and machine-config-daemon. This isn’t fully confirmed nor proven but I am aware about the case where machine-config-daemon was reverting changes in node’s annotations after they were edited and before node was rebooted. For that reason I recommend to give it a try and run two sessions: one with editor, the other one with shell on the affected node, to ensure once node annotations are being updated and saved, reboot is being triggered quickly enough to do not give machine-config-daemon of reverting node’s annotations. I will document it further once I face similar case again.


Leave a Reply

Your email address will not be published. Required fields are marked *