Bottlerocket
Note
External etcd topology is supported for vSphere, CloudStack, Snow and Nutanix clusters, but not yet for Bare Metal clusters.This guide requires some common shell tools such as:
grep
xargs
ssh
scp
cut
Make sure you have these installed on your admin machine before continuing.
Admin machine environment variables setup
On your admin machine, set the following environment variables that will later come in handy
Prepare etcd nodes for backup and restore
Install SCP on the etcd nodes:
Create etcd Backup
Make sure to setup the admin environment variables and prepare your ETCD nodes for backup before moving forward.
-
SSH into one of the etcd nodes
-
Drop into Bottlerocket’s root shell
-
Set these environment variables
-
Create the etcd snapshot
-
Move the snapshot to another directory and set proper permissions
-
Exit out of etcd node. You will have to type
exit
twice to get back to the admin machine -
Copy over the snapshot from the etcd node
You should now have the etcd snapshot in your current working directory.
Restore etcd from Backup
Make sure to setup the admin environment variables and prepare your etcd nodes for restore before moving forward.
-
Pause cluster reconciliation
Before starting the process of restoring etcd, you have to pause some cluster reconciliation objects so EKS Anywhere doesn’t try to perform any operations on the cluster while you restore the etcd snapshot.
-
Stop control plane core components
You also need to stop the control plane core components so the Kubernetes API server doesn’t try to communicate with etcd while you perform etcd operations.
- You can use this command to get the control plane node IPs which you can use to SSH
- SSH into the node and stop the core components. You must do this for each control plane node.
- Exit out of the Bottlerocket node
Repeat these steps for each control plane node.
-
Copy the backed-up etcd snapshot to all the etcd nodes
-
Perform the etcd restore
For this step, you have to SSH into each etcd node and run the restore command.
- Get etcd nodes IPs for SSH’ing into the nodes
- Cleanup temporary files and folders
- Exit out of the Bottlerocket node
Repeat this step for each etcd node.
-
Restart control plane core components
- You can use this command to get the control plane node IPs which you can use to SSH
- SSH into the node and restart the core components. You must do this for each control plane node.
- Exit out of the Bottlerocket node
Repeat these steps for each control plane node.
-
Unpause the cluster reconcilers
Once the etcd restore is complete, you can resume the cluster reconcilers.
At this point you should have the etcd cluster restored to snapshot. To verify, you can run the following commands:
You may also need to restart some deployments/daemonsets manually if they are stuck in an unhealthy state.