How to Use Linked Clones in Tanzu Kubernetes Grid

The two vSphere plans that come with TKG, dev and prod, will create kubernetes nodes (and the load balancer VM) by performing a full clone of the templates that were created. This will result in a each VM being completely independent, likely having better storage performance (arguable) but taking a while to deploy as the clone operation is not always quick and possibly consuming unnecessary disk space. You can take advantage of vSphere linked clone capabilities to allow the nodes to run off of a snapshot of the original template. This may result in a slight degradation in storage performance (again, arguable) but you will observe a drastic reduction in deployment time and scale out operations as well as a noticeably smaller storage footprint. For full disclosure, this process is not supported yet but functions well for lab/test environments. I’ve been using this method extensively in my own labs as I hate waiting.

Note: These steps assume the same build environment as noted in my previous post, TKG 1.1 on vSphere.

You will need to start out by creating a snapshot of your existing HA Proxy and Kubernetes node templates. This is achieved by converting these VM templates back to VMs, taking a snapshot of each and then converting them back to templates. You can see a bit of what this looks like in the following screenshots:

Next you will need to either create a new plan to allow for the VMs to be created as linked clones or you could modify an existing plan, but this may lead to confusion down the line. This example creates a new plan named linkedclone.

sed 's/fullClone/linkedClone/g' ~/.tkg/providers/infrastructure-vsphere/v0.6.4/cluster-template-dev.yaml > ~/.tkg/providers/infrastructure-vsphere/v0.6.4/cluster-template-linkedclone.yaml

With the new plan created, you can now create a new cluster using it.

tkg create cluster lc -p linkedclone -c 1 -w 3

Logs of the command execution can also be found at: /tmp/tkg-20200513T121128722757649.log
Creating workload cluster 'lc'...

Waiting for cluster nodes to be available...
Workload cluster 'lc' created

You will see that the deployment time is greatly reduced as we don’t have to wait for a full VM to be provisioned (about two minutes for a cluster with one control plane and three workers to come up):      

The one drawback that I’ve seen to this approach is that you can’t have nodes with varying storage footprints as the linked clone will always have the same disk size as the source VM. For lab or testing purposes, this may not be too big a concern though. I had the “opportunity” to test the Deploy Tanzu Kubernetes Grid to vSphere in an Air-Gapped Environment process out this weekend and ended up running through the initial bootstrapping and management cluster deployment process at least a dozen times. I estimate that I probably saved myself about six hours of time waiting as each time a management cluster is deployed in my lab it takes roughly thirty minutes for three full-clone VMs to be provisioned.

Leave a Comment

Your email address will not be published. Required fields are marked *