Node pools are a logical construct used with TMC-provisioned clusters. You can define a node pool while provisioning a cluster or after the cluster has already been created. A node pool will allow you to segregate your worker nodes based on location (primarly), size, or on some other deciding factor. This functionality is only available for clusters provisioned by TMC.
In this example, you can see that I am creating a cluster with two node pools. The first is named np-test-1, has three workers and is running out of us-east-1a. The second is named np-test-2, has 2 workers and is running out of us-east-1b.
When the cluster is created, it’s not immediately clear that there are multiple node pools in use:
But you can click on the Node Pools tab to see the configuration, which should match what was specified during cluster creation.
As you can see, there is an option to create a new node pool for an existing cluster. We’ll create a new pool in the third AZ under us-east-1c.
The new node pool is in a creating state while the new nodes are spun up in EC2.
And when it’s done it should look very similar to the other two node pools.
With the node pools setup, you can query nodes based on their node pool membership:
kubectl --kubeconfig=kubeconfig-np-test.yml get nodes --selector=tmc.cloud.vmware.com/nodepool=np-test-1 NAME STATUS ROLES AGE VERSION ip-10-0-1-143.ec2.internal Ready <none> 55m v1.18.5+vmware.1 ip-10-0-1-254.ec2.internal Ready <none> 55m v1.18.5+vmware.1 ip-10-0-1-66.ec2.internal Ready <none> 55m v1.18.5+vmware.1
kubectl --kubeconfig=kubeconfig-np-test.yml get nodes --selector=tmc.cloud.vmware.com/nodepool=np-test-2 NAME STATUS ROLES AGE VERSION ip-10-0-3-240.ec2.internal Ready <none> 55m v1.18.5+vmware.1 ip-10-0-3-70.ec2.internal Ready <none> 55m v1.18.5+vmware.1
kubectl --kubeconfig=kubeconfig-np-test.yml get nodes --selector=tmc.cloud.vmware.com/nodepool=np-test-3 NAME STATUS ROLES AGE VERSION ip-10-0-5-128.ec2.internal Ready <none> 36m v1.18.5+vmware.1 ip-10-0-5-47.ec2.internal Ready <none> 36m v1.18.5+vmware.1
And if we examine the running EC2 instances at AWS, we can see that the nodes are divided up among the availability zones as expected.
One other thing we can see that has happened is that we have multiple topology zones (failure domains) created that correspond to the availability zones in use by our node pools.
kubectl --kubeconfig=kubeconfig-np-test.yml get nodes -o 'custom-columns=Name:.metadata.name,Topology Zone:.metadata.labels.topology\.kubernetes\.io/zone,Node Pool:.metadata.labels.tmc\.cloud\.vmware\.com/nodepool' --sort-by='.metadata.labels.tmc\.cloud\.vmware\.com/nodepool' Name Topology Zone Node Pool ip-10-0-1-149.ec2.internal us-east-1a control-plane-01eh2waz3fj988mdvxvcn8w3x2-0 ip-10-0-3-8.ec2.internal us-east-1b control-plane-01eh2waz3fj988mdvxvcn8w3x2-1 ip-10-0-5-188.ec2.internal us-east-1c control-plane-01eh2waz3fj988mdvxvcn8w3x2-2 ip-10-0-1-66.ec2.internal us-east-1a np-test-1 ip-10-0-1-143.ec2.internal us-east-1a np-test-1 ip-10-0-1-254.ec2.internal us-east-1a np-test-1 ip-10-0-3-240.ec2.internal us-east-1b np-test-2 ip-10-0-3-70.ec2.internal us-east-1b np-test-2 ip-10-0-5-128.ec2.internal us-east-1c np-test-3 ip-10-0-5-47.ec2.internal us-east-1c np-test-3
With this in mind, whenever we create a deployment with multiple replicas and
podAntiAffinity based on topology zones, they should be spread out across as many topology zones as is possible.
I have a simple redis app that should meet these requirements. It has three replicas and a
podAntiAffinity rule based on
cat redis-multi-az.yaml apiVersion: apps/v1 kind: Deployment metadata: name: redis-cache spec: selector: matchLabels: app: store replicas: 3 template: metadata: labels: app: store spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - store topologyKey: topology.kubernetes.io/zone containers: - name: redis-server image: redis:3.2-alpine
Once this is up and running we can see that the three pods are indeed spread across three separate nodes.
kubectl --kubeconfig=kubeconfig-np-test.yml get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES redis-cache-6f4db8c6c8-2lhvt 1/1 Running 0 4m54s 192.168.110.47 ip-10-0-5-47.ec2.internal <none> <none> redis-cache-6f4db8c6c8-72rx7 1/1 Running 0 4m54s 192.168.14.80 ip-10-0-3-240.ec2.internal <none> <none> redis-cache-6f4db8c6c8-dq9lk 1/1 Running 0 4m54s 192.168.219.77 ip-10-0-1-66.ec2.internal <none> <none>
And it we dig into these three nodes we can see that they are each running in a different zone.
kubectl --kubeconfig=kubeconfig-np-test.yml get nodes ip-10-0-5-47.ec2.internal ip-10-0-3-240.ec2.internal ip-10-0-1-66.ec2.internal -o 'custom-columns=Name:.metadata.name,Topology Zone:.metadata.labels.topology\.kubernetes\.io/zone,Node Pool:.metadata.labels.tmc\.cloud\.vmware\.com/nodepool' --sort-by='.metadata.labels.tmc\.cloud\.vmware\.com/nodepool' Name Topology Zone Node Pool ip-10-0-1-66.ec2.internal us-east-1a np-test-1 ip-10-0-3-240.ec2.internal us-east-1b np-test-2 ip-10-0-5-47.ec2.internal us-east-1c np-test-3
I originally chose to use
topology.kubernetes.io/zone as my
topologyKey simply because using this or the now-deprecated
failure-domain.beta.kubernetes.io/zone, is how I’ve always done it. I was very happy to learn that the same results can be achieved via setting the topologyKey to
You could just as easily put different node pools in the same availability zone at AWS but specify different sizing between the pools, or a different labeling scheme, and then use these to separate or aggregate workloads.
One thing to keep in mind is that once a node pool is created, the only thing you can change is the number of nodes in the pool (notice how almost everything is grayed-out in the following screenshot).
If you want to change the sizing, name, description, labels or AZ you’ll need to create a new node pool with the desired parameters. You can delete a node pool and any workloads that are not part of a standalone pod should be recreated on other nodes in the cluster, barring any anti-affinity settings or taints/tolerations in play. Just be careful as you are allowed to delete the last node pool and this will leave your cluster in a non-functional state.