Scale Bare Metal cluster
Scaling nodes on Bare Metal clusters
When you are horizontally scaling your Bare Metal EKS Anywhere cluster, consider the number of nodes you need for your control plane and for your data plane.
See the Kubernetes Components documentation to learn the differences between the control plane and the data plane (worker nodes).
Horizontally scaling the cluster is done by increasing the number for the control plane or worker node groups under the Cluster specification.
NOTE: If etcd is running on your control plane (the default configuration) you should scale your control plane in odd numbers (3, 5, 7…).
apiVersion: anywhere.eks.amazonaws.com/v1
kind: Cluster
metadata:
name: test-cluster
spec:
controlPlaneConfiguration:
count: 1 # increase this number to horizontally scale your control plane
...
workerNodeGroupConfigurations:
- count: 1 # increase this number to horizontally scale your data plane
Next, you must ensure you have enough available hardware for the scale-up operation to function. Available hardware could have been fed to the cluster as extra hardware during a prior create command, or could be fed to the cluster during the scale-up process by providing the hardware CSV file to the upgrade cluster command (explained in detail below). For scale-down operation, you can skip directly to the upgrade cluster command .
To check if you have enough available hardware for scale up, you can use the kubectl
command below to check if there are hardware with the selector labels corresponding to the controlplane/worker node group and without the ownerName
label.
kubectl get hardware -n eksa-system --show-labels
For example, if you want to scale a worker node group with selector label type=worker-group-1
, then you must have an additional hardware object in your cluster with the label type=worker-group-1
that doesn’t have the ownerName
label.
In the command shown below, eksa-worker2
matches the selector label and it doesn’t have the ownerName
label. Thus, it can be used to scale up worker-group-1
by 1.
kubectl get hardware -n eksa-system --show-labels
NAME STATE LABELS
eksa-controlplane type=controlplane,v1alpha1.tinkerbell.org/ownerName=abhnvp-control-plane-template-1656427179688-9rm5f,v1alpha1.tinkerbell.org/ownerNamespace=eksa-system
eksa-worker1 type=worker-group-1,v1alpha1.tinkerbell.org/ownerName=abhnvp-md-0-1656427179689-9fqnx,v1alpha1.tinkerbell.org/ownerNamespace=eksa-system
eksa-worker2 type=worker-group-1
If you don’t have any available hardware that match this requirement in the cluster, you can setup a new hardware CSV . You can feed this hardware inventory file during the upgrade cluster command .
Upgrade Cluster Command for Scale Up/Down
-
eksctl CLI: To upgrade a workload cluster with eksctl, run:
eksctl anywhere upgrade cluster -f cluster.yaml # --hardware-csv <hardware.csv> \ # uncomment to add more hardware --kubeconfig mgmt/mgmt-eks-a-cluster.kubeconfig
As noted earlier, adding the
--kubeconfig
option tellseksctl
to use the management cluster identified by that kubeconfig file to create a different workload cluster. -
kubectl CLI: The cluster lifecycle feature lets you use kubectl to talk to the Kubernetes API to upgrade a workload cluster. To use kubectl, run:
kubectl apply -f eksa-w01-cluster.yaml --kubeconfig mgmt/mgmt-eks-a-cluster.kubeconfig
To check the state of a cluster managed with the cluster lifecyle feature, use
kubectl
to show the cluster object with its status.The
status
field on the cluster object field holds information about the current state of the cluster.kubectl get clusters w01 -o yaml
The cluster has been fully upgraded once the status of the
Ready
condition is markedTrue
. See the cluster status guide for more information. -
GitOps: See Manage separate workload clusters with GitOps
-
Terraform: See Manage separate workload clusters with Terraform
NOTE:For kubectl, GitOps and Terraform:
- The baremetal controller does not support scaling upgrades and Kubernetes version upgrades in the same request.
- While scaling workload cluster if you need to add additional machines, run:
eksctl anywhere generate hardware -z updated-hardware.csv > updated-hardware.yaml kubectl apply -f updated-hardware.yaml
- For scaling multiple workload clusters, it is essential that the hardware that will be used for scaling up clusters has labels and selectors that are unique to the target workload cluster. For instance, for an EKSA cluster named
eksa-workload1
, the hardware that is assigned for this cluster should have labels that are only going to be used for this cluster liketype=eksa-workload1-cp
andtype=eksa-workload1-worker
. Another workload cluster namedeksa-workload2
can have labels liketype=eksa-workload2-cp
andtype=eksa-workload2-worker
. Please note that even though labels can be arbitrary, they need to be unique for each workload cluster. Not specifying unique cluster labels can cause cluster upgrades to behave in unexpected ways which may lead to unsuccessful upgrades and unstable clusters.
Selective Server Removal During Scale Down
When scaling down your Bare Metal EKS Anywhere cluster, you may want to remove a specific server rather than letting the system automatically choose which node to remove. By default, Cluster API will remove nodes in a non-deterministic order during scale down operations.
IMPORTANT: Do not attempt to remove specific servers by deleting entries from the hardware CSV file and running the upgrade command. This approach will not have deterministic behavior.
To target a specific server for removal during scale down, use the Cluster API delete annotation on the corresponding machine resource. This ensures that the specific machine is prioritized for removal during the scale-down operation.
Steps to Remove a Specific Server
-
Identify the target node: First, determine which node you want to remove by listing all nodes in your cluster:
kubectl get nodes
-
Find the corresponding CAPI machine: List all CAPI machines to find the machine resource that corresponds to your target node:
kubectl get machines.cluster.x-k8s.io -n eksa-system
You can also find the specific machine for a node using:
NODE_NAME="your-target-node-name" kubectl get machines.cluster.x-k8s.io -n eksa-system --no-headers | awk -v node="$NODE_NAME" '$3==node {print $1}'
-
Add the delete annotation: Mark the machine for deletion by adding the Cluster API delete annotation:
MACHINE_NAME="your-machine-name" kubectl annotate machines.cluster.x-k8s.io $MACHINE_NAME cluster.x-k8s.io/delete-machine=yes -n eksa-system
-
Update your cluster specification: Reduce the node count in your EKS Anywhere cluster specification file:
apiVersion: anywhere.eks.amazonaws.com/v1 kind: Cluster metadata: name: test-cluster spec: workerNodeGroupConfigurations: - count: 2 # decrease this number for scale down
-
Apply the scale down operation: Run the upgrade command to perform the scale down:
eksctl anywhere upgrade cluster -f cluster.yaml --kubeconfig mgmt/mgmt-eks-a-cluster.kubeconfig
The annotated machine will be prioritized for removal during the scale-down operation, ensuring that your specific target server is removed from the cluster.
NOTE: The hardware CSV is designed to manage the hardware catalog by creating Hardware , Machine , and BMC credential resources. For operational tasks like scaling, always use the cluster specification count and CAPI machine annotations rather than modifying the hardware CSV.
Autoscaling
EKS Anywhere supports autoscaling of worker node groups using the Kubernetes Cluster Autoscaler and as a curated package .
See here and as a curated package for details on how to configure your cluster spec to autoscale worker node groups for autoscaling.