Working with TMC Node Pools

Node pools are a logical construct used with TMC-provisioned clusters. You can define a node pool while provisioning a cluster or after the cluster has already been created. A node pool will allow you to segregate your worker nodes based on location (primarly), size, or on some other deciding factor. This functionality is only available for clusters provisioned by TMC.

In this example, you can see that I am creating a cluster with two node pools. The first is named np-test-1, has three workers and is running out of us-east-1a. The second is named np-test-2, has 2 workers and is running out of us-east-1b.

When the cluster is created, it’s not immediately clear that there are multiple node pools in use:

But you can click on the Node Pools tab to see the configuration, which should match what was specified during cluster creation.

As you can see, there is an option to create a new node pool for an existing cluster. We’ll create a new pool in the third AZ under us-east-1c.

The new node pool is in a creating state while the new nodes are spun up in EC2.

And when it’s done it should look very similar to the other two node pools.

With the node pools setup, you can query nodes based on their node pool membership:

kubectl --kubeconfig=kubeconfig-np-test.yml get nodes --selector=tmc.cloud.vmware.com/nodepool=np-test-1
 
NAME                         STATUS   ROLES    AGE   VERSION
ip-10-0-1-143.ec2.internal   Ready    <none>   55m   v1.18.5+vmware.1
ip-10-0-1-254.ec2.internal   Ready    <none>   55m   v1.18.5+vmware.1
ip-10-0-1-66.ec2.internal    Ready    <none>   55m   v1.18.5+vmware.1
kubectl --kubeconfig=kubeconfig-np-test.yml get nodes --selector=tmc.cloud.vmware.com/nodepool=np-test-2
 
NAME                         STATUS   ROLES    AGE   VERSION
ip-10-0-3-240.ec2.internal   Ready    <none>   55m   v1.18.5+vmware.1
ip-10-0-3-70.ec2.internal    Ready    <none>   55m   v1.18.5+vmware.1

kubectl --kubeconfig=kubeconfig-np-test.yml get nodes --selector=tmc.cloud.vmware.com/nodepool=np-test-3
 
NAME                         STATUS   ROLES    AGE   VERSION
ip-10-0-5-128.ec2.internal   Ready    <none>   36m   v1.18.5+vmware.1
ip-10-0-5-47.ec2.internal    Ready    <none>   36m   v1.18.5+vmware.1

And if we examine the running EC2 instances at AWS, we can see that the nodes are divided up among the availability zones as expected.

One other thing we can see that has happened is that we have multiple topology zones (failure domains) created that correspond to the availability zones in use by our node pools.

kubectl --kubeconfig=kubeconfig-np-test.yml get nodes -o 'custom-columns=Name:.metadata.name,Topology Zone:.metadata.labels.topology\.kubernetes\.io/zone,Node Pool:.metadata.labels.tmc\.cloud\.vmware\.com/nodepool' --sort-by='.metadata.labels.tmc\.cloud\.vmware\.com/nodepool'
 
Name                         Topology Zone     Node Pool
ip-10-0-1-149.ec2.internal   us-east-1a        control-plane-01eh2waz3fj988mdvxvcn8w3x2-0
ip-10-0-3-8.ec2.internal     us-east-1b        control-plane-01eh2waz3fj988mdvxvcn8w3x2-1
ip-10-0-5-188.ec2.internal   us-east-1c        control-plane-01eh2waz3fj988mdvxvcn8w3x2-2
ip-10-0-1-66.ec2.internal    us-east-1a        np-test-1
ip-10-0-1-143.ec2.internal   us-east-1a        np-test-1
ip-10-0-1-254.ec2.internal   us-east-1a        np-test-1
ip-10-0-3-240.ec2.internal   us-east-1b        np-test-2
ip-10-0-3-70.ec2.internal    us-east-1b        np-test-2
ip-10-0-5-128.ec2.internal   us-east-1c        np-test-3
ip-10-0-5-47.ec2.internal    us-east-1c        np-test-3

With this in mind, whenever we create a deployment with multiple replicas and podAntiAffinity based on topology zones, they should be spread out across as many topology zones as is possible. 

I have a simple redis app that should meet these requirements. It has three replicas and a podAntiAffinity rule based on topology.kubernetes.io/zone.

cat redis-multi-az.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cache
spec:
  selector:
    matchLabels:
      app: store
  replicas: 3
  template:
    metadata:
      labels:
        app: store
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - store
              topologyKey: topology.kubernetes.io/zone
      containers:
      - name: redis-server
        image: redis:3.2-alpine

Once this is up and running we can see that the three pods are indeed spread across three separate nodes.

kubectl --kubeconfig=kubeconfig-np-test.yml get po -o wide
 
NAME                           READY   STATUS    RESTARTS   AGE     IP               NODE                         NOMINATED NODE   READINESS GATES
redis-cache-6f4db8c6c8-2lhvt   1/1     Running   0          4m54s   192.168.110.47   ip-10-0-5-47.ec2.internal    <none>           <none>
redis-cache-6f4db8c6c8-72rx7   1/1     Running   0          4m54s   192.168.14.80    ip-10-0-3-240.ec2.internal   <none>           <none>
redis-cache-6f4db8c6c8-dq9lk   1/1     Running   0          4m54s   192.168.219.77   ip-10-0-1-66.ec2.internal    <none>           <none>

And it we dig into these three nodes we can see that they are each running in a different zone.

kubectl --kubeconfig=kubeconfig-np-test.yml get nodes ip-10-0-5-47.ec2.internal ip-10-0-3-240.ec2.internal ip-10-0-1-66.ec2.internal -o 'custom-columns=Name:.metadata.name,Topology Zone:.metadata.labels.topology\.kubernetes\.io/zone,Node Pool:.metadata.labels.tmc\.cloud\.vmware\.com/nodepool' --sort-by='.metadata.labels.tmc\.cloud\.vmware\.com/nodepool'
 
Name                         Topology Zone     Node Pool
ip-10-0-1-66.ec2.internal    us-east-1a        np-test-1
ip-10-0-3-240.ec2.internal   us-east-1b        np-test-2
ip-10-0-5-47.ec2.internal    us-east-1c        np-test-3

I originally chose to use topology.kubernetes.io/zone as my topologyKey simply because using this or the now-deprecated failure-domain.beta.kubernetes.io/zone, is how I’ve always done it. I was very happy to learn that the same results can be achieved via setting the topologyKey to tmc.cloud.vmware.com/nodepool.

You could just as easily put different node pools in the same availability zone at AWS but specify different sizing between the pools, or a different labeling scheme, and then use these to separate or aggregate workloads.

One thing to keep in mind is that once a node pool is created, the only thing you can change is the number of nodes in the pool (notice how almost everything is grayed-out in the following screenshot).

If you want to change the sizing, name, description, labels or AZ you’ll need to create a new node pool with the desired parameters. You can delete a node pool and any workloads that are not part of a standalone pod should be recreated on other nodes in the cluster, barring any anti-affinity settings or taints/tolerations in play. Just be careful as you are allowed to delete the last node pool and this will leave your cluster in a non-functional state.

Leave a Comment

Your email address will not be published. Required fields are marked *