The 1.2.1 release of TKG does not come with many new features over the 1.2.0 release but one very important one is the inclusion of autoscaler functionality. Enabling and configuring it on your cluster is fairly straightforward and I’ll walk through it in this post.
My TKG 1.2.1 deployment is running on top of vSphere 7.0 U1 and starts out with a management cluster that is comprised of a single control plane node and a single worker node (both small sized). The Kubernetes version in play is 1.19.3 but you could also run 1.18.10 or 1.17.13.
tkg get mc
MANAGEMENT-CLUSTER-NAME CONTEXT-NAME STATUS
tkg-mgmt-as * tkg-mgmt-as-admin@tkg-mgmt-as Success
kubectl get nodes
NAME STATUS ROLES AGE VERSION
tkg-mgmt-as-control-plane-z6t9v Ready master 6m15s v1.19.3+vmware.1
tkg-mgmt-as-md-0-79b5ff4b89-vhfcc Ready 5m1s v1.19.3+vmware.1
We can only enable autoscaler functionality on workload clusters so I’ll be creating one next, but first I need to set a few environment variables so that the autoscaler is configured properly. You can examine the .tkg/providers/config_default.yaml
file to see what the available autoscaler options are.
grep AUTOSCALER .tkg/providers/config_default.yaml
ENABLE_AUTOSCALER: false
AUTOSCALER_MAX_NODES_TOTAL: "0"
AUTOSCALER_SCALE_DOWN_DELAY_AFTER_ADD: "10m"
AUTOSCALER_SCALE_DOWN_DELAY_AFTER_DELETE: "10s"
AUTOSCALER_SCALE_DOWN_DELAY_AFTER_FAILURE: "3m"
AUTOSCALER_SCALE_DOWN_UNNEEDED_TIME: "10m"
AUTOSCALER_MAX_NODE_PROVISION_TIME: "15m"
AUTOSCALER_MIN_SIZE_0:
AUTOSCALER_MAX_SIZE_0:
AUTOSCALER_MIN_SIZE_1:
AUTOSCALER_MAX_SIZE_1:
AUTOSCALER_MIN_SIZE_2:
AUTOSCALER_MAX_SIZE_2:
The ENABLE_AUTOSCALER
option is self-explanatory and I won’t be using/setting it in lieu of using the --enable-cluster-options
flag to the tkg create cluster
command. The following should help to explain the other options:
AUTOSCALER_MAX_NODES_TOTAL
– The maximum number of Kubernetes worker nodes for the entire cluster (this must be set higher than the default of zero for the autoscaler to function)AUTOSCALER_SCALE_DOWN_DELAY_AFTER_ADD
– The amount of time that the system will wait after a scale up operation before resuming scale down evaluations.AUTOSCALER_SCALE_DOWN_DELAY_AFTER_DELETE
– The amount of time that the system will wait after a node deletion before resuming scale down evaluations.AUTOSCALER_SCALE_DOWN_DELAY_AFTER_FAILURE
– The amount of time that the system will wait after a scale down failure before resuming scale down evaluations.AUTOSCALER_SCALE_DOWN_UNNEEDED_TIME
– The amount of time a node is deemed unnecessary before it is appropriate for the autoscaler to scale down the cluster and delete the node.AUTOSCALER_MAX_NODE_PROVISION_TIME
– The amount of time that the system will wait for a new node to be created during a scale up operation.AUTOSCALER_MIN_SIZE_0
– The lowest number of Kubernetes worker nodes that the system will scale down to for a single availability zone (AZ). This value is for AZ 0 and theAUTOSCALER_MIN_SIZE_1
andAUTOSCALER_MIN_SIZE_2
values correspond to AZ 1 and AZ 2 respectively. The 1 and values are currently only applicable to TKG clusters running on AWS as multi-AZ support is not yet available for vSphere or Azure. If you don’t set this value, it will default to theWORKER_MACHINE_COUNT_#
value.A
UTOSCALER_MAX_SIZE_0
– The highest number of Kubernetes worker nodes that the system will scale up to for a single availability zone (AZ). This value is for AZ 0 and theAUTOSCALER_MAX_SIZE_1
andAUTOSCALER_MAX_SIZE_2
values correspond to AZ 1 and AZ 2 respectively. The 1 and 2 values are currently only applicable to TKG clusters running on AWS as multi-AZ support is not yet available for vSphere or Azure. If you don’t set this value, it will default to theWORKER_MACHINE_COUNT_#
value.
One thing worth paying special attention to are the AUTOSCALER_MIN_SIZE_0
and AUTOSCALER_MAX_SIZE_0
values. If you don’t set either of these, they will both default to the WORKER_MACHINE_COUNT_0
value and the autoscaler will effectively do nothing as the lower and upper bounds within which it can work are the same. To allow for scale up operations, you must at least set the AUTOSCALER_MAX_SIZE_0
to something higher than the WORKER_MACHINE_COUNT_0
value. Not setting the AUTOSCALER_MIN_SIZE_0
value will not prohibit scale down operations as they can happen if the cluster has been scaled up. You could set this value to something lower than the WORKER_MACHINE_COUNT_0
value to allow the AZ to be sized smaller than its initial configuration if it was oversized.
With all of this in mind, I’m going to set some environment variables and then deploy my cluster.
export WORKER_MACHINE_COUNT=2
export AUTOSCALER_MIN_SIZE_0=1
export AUTOSCALER_MAX_SIZE_0=5
tkg create cluster tkg-wld-as -p dev --enable-cluster-options autoscaler --vsphere-controlplane-endpoint 192.168.110.103 -v 6
One of the first things that happens is that a new deployment is created in the default namespace of the management cluster.
kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
tkg-wld-as-cluster-autoscaler 1/1 1 1 77s
If we do a describe on this deployment we can see that it will be running the autoscaler image against the new cluster (output has been truncated)
kubectl describe deployment tkg-wld-as-cluster-autoscaler
Containers:
tkg-wld-as-cluster-autoscaler:
Image: registry.tkg.vmware.run/cluster-autoscaler:v1.19.1_vmware.1
Port:
Host Port:
Command:
/cluster-autoscaler
Args:
--cloud-provider=clusterapi
--v=4
--clusterapi-cloud-config-authoritative
--kubeconfig=/mnt/tkg-wld-as-kubeconfig/value
--node-group-auto-discovery=clusterapi:clusterName=tkg-wld-as
--scale-down-delay-after-add=10m
--scale-down-delay-after-delete=10s
--scale-down-delay-after-failure=3m
--scale-down-unneeded-time=10m
--max-node-provision-time=15m
--max-nodes-total=0
You can do a describe on the machinedeployment and check the Annotations to validate that the autoscaler settings provided via environment variables were set.
kubectl describe machinedeployment tkg-wld-as-md-0
Name: tkg-wld-as-md-0
Namespace: default
Labels: cluster.x-k8s.io/cluster-name=tkg-wld-as
Annotations: cluster.k8s.io/cluster-api-autoscaler-node-group-max-size: 5
cluster.k8s.io/cluster-api-autoscaler-node-group-min-size: 1
machinedeployment.clusters.x-k8s.io/revision: 1
If you set the verbosity high you’ll see that the cluster creation process is waiting for the autoscaler to be online (for a very short time)
Waiting for cluster autoscaler to be available…
Waiting for resource tkg-wld-as-cluster-autoscaler of type *v1.Deployment to be up and running
Once the cluster is up and running we can see that I’ve got one control plane node and two workers.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
tkg-wld-as-control-plane-kkz2p Ready master 5m31s v1.19.3+vmware.1
tkg-wld-as-md-0-58588b7b5f-g29gc Ready 4m6s v1.19.3+vmware.1
tkg-wld-as-md-0-58588b7b5f-wsvqp Ready 4m4s v1.19.3+vmware.1
By investigating the logs in the autoscaler pod in the management cluster, we can see that it is already deciding that two worker nodes is more than is needed.
kubectl logs tkg-wld-as-cluster-autoscaler-7678c4744b-rls67
I1210 20:23:21.252087 1 clusterapi_controller.go:555] node "tkg-wld-as-md-0-58588b7b5f-wsvqp" is in nodegroup "MachineDeployment/default/tkg-wld-as-md-0"
I1210 20:23:21.252266 1 clusterapi_controller.go:555] node "tkg-wld-as-md-0-58588b7b5f-g29gc" is in nodegroup "MachineDeployment/default/tkg-wld-as-md-0"
I1210 20:23:21.252337 1 static_autoscaler.go:492] tkg-wld-as-md-0-58588b7b5f-g29gc is unneeded since 2020-12-10 20:16:51.178779067 +0000 UTC m=+116.454260415 duration 6m28.860466285s
I1210 20:23:21.252383 1 static_autoscaler.go:492] tkg-wld-as-md-0-58588b7b5f-wsvqp is unneeded since 2020-12-10 20:16:02.161944324 +0000 UTC m=+67.437425653 duration 7m17.877301047s
I1210 20:23:21.252421 1 static_autoscaler.go:503] Scale down status: unneededOnly=true lastScaleUpTime=2020-12-10 20:14:55.363937358 +0000 UTC m=+0.639418683 lastScaleDownDeleteTime=2020-12-10 20:14:55.363937433 +0000 UTC m=+0.639418761 lastScaleDownFailTime=2020-12-10 20:14:55.363937511 +0000 UTC m=+0.639418837 scaleDownForbidden=false isDeleteInProgress=false scaleDownInCooldown=true
I1210 20:23:21.446182 1 request.go:581] Throttling request took 193.349574ms, request: GET:https://100.64.0.1:443/apis/cluster.x-k8s.io/v1alpha3/namespaces/default/machinedeployments/tkg-wld-as-md-0/scale
I1210 20:23:21.450888 1 clusterapi_provider.go:70] discovered node group: MachineDeployment/default/tkg-wld-as-md-0 (min: 1, max: 5, replicas: 2)
Once the cluster has been up for 10 minutes with practically no load, we’ll see the autoscaler remove a node (output is truncated).
kubectl logs tkg-wld-as-cluster-autoscaler-7678c4744b-rls67
I1210 20:26:15.046159 1 static_autoscaler.go:516] Starting scale down
I1210 20:26:14.846856 1 scale_down.go:421] Node tkg-wld-as-md-0-58588b7b5f-wsvqp - cpu utilization 0.200000
I1210 20:26:14.846923 1 scale_down.go:421] Node tkg-wld-as-md-0-58588b7b5f-g29gc - cpu utilization 0.200000
I1210 20:26:14.847323 1 clusterapi_controller.go:555] node "tkg-wld-as-md-0-58588b7b5f-wsvqp" is in nodegroup "MachineDeployment/default/tkg-wld-as-md-0"
I1210 20:26:15.037546 1 request.go:581] Throttling request took 189.955197ms, request: GET:https://100.64.0.1:443/apis/cluster.x-k8s.io/v1alpha3/namespaces/default/machinedeployments/tkg-wld-as-md-0/scale
I1210 20:26:15.044372 1 clusterapi_controller.go:555] node "tkg-wld-as-md-0-58588b7b5f-g29gc" is in nodegroup "MachineDeployment/default/tkg-wld-as-md-0"
I1210 20:26:15.044422 1 cluster.go:148] Fast evaluation: tkg-wld-as-md-0-58588b7b5f-g29gc for removal
I1210 20:26:15.044437 1 cluster.go:185] Fast evaluation: node tkg-wld-as-md-0-58588b7b5f-g29gc may be removed
I1210 20:26:15.046076 1 static_autoscaler.go:492] tkg-wld-as-md-0-58588b7b5f-g29gc is unneeded since 2020-12-10 20:16:51.178779067 +0000 UTC m=+116.454260415 duration 9m22.657249955s
I1210 20:26:15.046092 1 static_autoscaler.go:492] tkg-wld-as-md-0-58588b7b5f-wsvqp is unneeded since 2020-12-10 20:16:02.161944324 +0000 UTC m=+67.437425653 duration 10m11.674084717s
I1210 20:26:15.658016 1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"77fb9bcb-cfcc-4b83-8694-d76fa51744ae", APIVersion:"v1", ResourceVersion:"3843", FieldPath:""}): type: 'Normal' reason: 'ScaleDownEmpty' Scale-down: removing empty node tkg-wld-as-md-0-58588b7b5f-wsvqp
I1210 20:26:17.058060 1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"tkg-wld-as-md-0-58588b7b5f-wsvqp", UID:"f06ac578-6ab9-4910-b61c-62726509e52e", APIVersion:"v1", ResourceVersion:"3567", FieldPath:""}): type: 'Normal' reason: 'ScaleDown' node removed by cluster autoscaler
We can see that the cluster has one fewer node than it did just a few minutes ago since node node tkg-wld-as-md-0-58588b7b5f-wsvqp was just removed.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
tkg-wld-as-control-plane-kkz2p Ready master 15m v1.19.3+vmware.1
tkg-wld-as-md-0-58588b7b5f-g29gc Ready 14m v1.19.3+vmware.1
To see how the autoscaler will add more nodes to the cluster, we’ll have to create some load. I’ll do this by deploying the php-apache web server application noted at Horizontal Pod Autoscaler Walkthrough. The only difference I’m making is that I’ve set the number of replicas to 3.
kubectl get po
NAME READY STATUS RESTARTS AGE
php-apache-d4cf67d68-2w7wz 1/1 Running 0 112s
php-apache-d4cf67d68-jxw9s 1/1 Running 0 112s
php-apache-d4cf67d68-l78ff 1/1 Running 0 112s
These pods aren’t doing much and there is still not enough load for the autoscaler to kick in and provision a new node. If I increase the number of replicas to 10, there is an almost immediate response from the autoscaler to create a new node.
kubectl get po
NAME READY STATUS RESTARTS AGE
php-apache-d4cf67d68-2267j 1/1 Running 0 2m44s
php-apache-d4cf67d68-2w7wz 1/1 Running 0 20m
php-apache-d4cf67d68-78bm6 1/1 Running 0 2m44s
php-apache-d4cf67d68-b2vnh 1/1 Running 0 2m44s
php-apache-d4cf67d68-j5fqs 1/1 Running 0 2m44s
php-apache-d4cf67d68-jxw9s 1/1 Running 0 20m
php-apache-d4cf67d68-l78ff 1/1 Running 0 20m
php-apache-d4cf67d68-l7mmj 1/1 Running 0 2m44s
php-apache-d4cf67d68-s29xc 1/1 Running 0 2m44s
php-apache-d4cf67d68-w6fwb 1/1 Running 0 2m44s
kubectl --context=tkg-wld-as-admin@tkg-wld-as describe po php-apache-d4cf67d68-s29xc
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m43s (x2 over 3m43s) default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
Normal TriggeredScaleUp 3m35s cluster-autoscaler pod triggered scale-up: [{MachineDeployment/default/tkg-wld-as-md-0 1->2 (max: 5)}]
Warning FailedScheduling 2m24s (x4 over 2m54s) default-scheduler 0/3 nodes are available: 1 Insufficient cpu, 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 1 node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate.
Normal Scheduled 2m14s default-scheduler Successfully assigned default/php-apache-d4cf67d68-s29xc to tkg-wld-as-md-0-58588b7b5f-nfzqs
If we look at the nodes present in the cluster now, we’ll see that there is a new one named tkg-wld-as-md-0-58588b7b5f-nfzqs.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
tkg-wld-as-control-plane-kkz2p Ready master 68m v1.19.3+vmware.1
tkg-wld-as-md-0-58588b7b5f-g29gc Ready 66m v1.19.3+vmware.1
tkg-wld-as-md-0-58588b7b5f-nfzqs Ready 4m11s v1.19.3+vmware.1
And looking in the autoscaler logs, we can see the details behind what triggered the new node to be created (output is truncated).
kubectl logs tkg-wld-as-cluster-autoscaler-7678c4744b-rls67
I1210 21:17:43.586296 1 filter_out_schedulable.go:65] Filtering out schedulables
I1210 21:17:43.586369 1 filter_out_schedulable.go:132] Filtered out 0 pods using hints
I1210 21:17:43.781018 1 filter_out_schedulable.go:170] 1 pods were kept as unschedulable based on caching
I1210 21:17:43.781069 1 filter_out_schedulable.go:171] 0 pods marked as unschedulable can be scheduled.
I1210 21:17:43.781092 1 filter_out_schedulable.go:82] No schedulable pods
I1210 21:17:43.796755 1 klogx.go:86] Pod default/php-apache-d4cf67d68-2267j is unschedulable
I1210 21:17:43.796817 1 klogx.go:86] Pod default/php-apache-d4cf67d68-s29xc is unschedulable
I1210 21:17:44.412791 1 scale_up.go:456] Best option to resize: MachineDeployment/default/tkg-wld-as-md-0
I1210 21:17:44.412850 1 scale_up.go:460] Estimated 1 nodes needed in MachineDeployment/default/tkg-wld-as-md-0
I1210 21:17:44.588553 1 scale_up.go:574] Final scale-up plan: [{MachineDeployment/default/tkg-wld-as-md-0 1->2 (max: 5)}]
I1210 21:17:44.593599 1 scale_up.go:663] Scale-up: setting group MachineDeployment/default/tkg-wld-as-md-0 size to 2
I1210 21:17:44.653323 1 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"77fb9bcb-cfcc-4b83-8694-d76fa51744ae", APIVersion:"v1", ResourceVersion:"18369", FieldPath:""}): type: 'Normal' reason: 'ScaledUpGroup' Scale-up: setting group MachineDeployment/default/tkg-wld-as-md-0 size to 2
W1210 21:17:56.300406 1 clusterapi_controller.go:454] Machine "tkg-wld-as-md-0-58588b7b5f-nfzqs" has no providerID
I1210 21:17:56.300445 1 clusterapi_controller.go:478] Status.NodeRef of machine "tkg-wld-as-md-0-58588b7b5f-nfzqs" is currently nil
I1210 21:17:56.300452 1 clusterapi_controller.go:509] nodegroup tkg-wld-as-md-0 has nodes [vsphere://422af9c5-6bd9-121c-e616-ea6dc706cda5 vsphere://422a1fe4-dc84-3fa6-ab50-175437c2a967]
And very soon after we can see the next attempt to analyze for a scale down opportunity does not find one (output is truncated).
kubectl logs tkg-wld-as-cluster-autoscaler-7678c4744b-rls67
I1210 21:18:53.910197 1 scale_down.go:421] Node tkg-wld-as-md-0-58588b7b5f-g29gc - cpu utilization 1.000000
I1210 21:18:53.910207 1 scale_down.go:424] Node tkg-wld-as-md-0-58588b7b5f-g29gc is not suitable for removal - cpu utilization too big (1.000000)