Recently, I migrated all of my VMs that run my k3s cluster from libvirt
to proxmox
. I challenged myself to do it without any downtime, and I had been successful, until this happened…
After removing the third master node from the cluster and attempting to add a new one, I the installation process hung at this line:
|
|
Troubleshooting
I left it there for about 10 minutes and nothing happened, so I started digging into journalctl
,
|
|
where I found a bunch of errors like this:
|
|
The IP 10.0.69.101
was the IP of the master node that I had removed from the cluster. Weird. I have removed nodes from kubernetes clusters before, and I’ve never seen the remaining nodes trying to connect to the removed node.
Since I could see no trace of the old node from the control plane, and the port 2380 was not a familiar one to me, I suspected that something else than k3s itself was trying to connect to the removed node.
I looked it up and quickly found out it was used for “etcd server client API” (See: Kubernetes Ports and Protocols.
Solution
From the above information, it was pretty clear that, somehow, the old node was still part of the etcd cluster.
Here is how I removed it from the cluster:
First SSH into one of the remaining master nodes, and install etcdctl
(k3s does not come with it by default):
|
|
As for the version, I got lazy and just used the latest version listed here. Really, you should check the “Embedded Component Versions” section of the k3s release listed here: k3s releases.
After setting the environment variables for etcdctl
:
|
|
Locate the lingering node’s ID with:
|
|
Example output:
|
|
Remove the node:
|
|
Now go back to the new node that was hanging, and restart the installation process. It should now complete successfully.
Conclusion
Even after resolving the issue, it is still not clear to me why this happened. I had removed nodes before with the same procedure:
- Cordon
- Drain
- Delete node and every time, the node was automatically removed from the etcd cluster as well.
I thought this issue might occur again in the future, so I decided to write about it here.