Much of this is based off of the instructions noted at Upgrading Tanzu Kubernetes Grid.
TKG 1.1 provides the first opportunity to perform an upgrade of the product. The following are my notes from performing a TKG upgrade.
The first step will be to download the components needed to install TKG 1.1 from https://my.vmware.com/web/vmware/details?downloadGroup=TKG-110&productId=988&rPId=46507. This will include the CLI, the Kubernetes v1.18.2 OVA, the Load Balancer OVA and the extension manifests. The size of all of these items is only a few GB so it should be done relatively quickly. The TKG CLI executables can be renamed to just tkg
and tkg.exe
to make them easier to use.
Create vSphere Resources
Since there are updates to the HA Proxy VM and the Kubernetes node VMs, we’ll need to deploy new OVA files to be used during the upgrade process and for any new clusters that get created post-upgrade.
In the vSphere Client, create Resource Pools named TKG-Mgmt and TKG-Comp, create vm folders named the same
In the vSphere Client, right-click an object in the vCenter Server inventory, select Deploy OVF template.
Select Local file, click the button to upload files, and navigate to the photon-3-kube-v1.18.2+vmware.1.ova
file.
Follow the installer prompts to deploy a VM from the OVA template.
- Leave VM name as
photon-3-kube-v1.18.2+vmware.1
- Select the compute resource as RegionA01-MGMT
- Accept the end user license agreements (EULA)
- Select the map-vol datastore
- Select the Dswitch-Management network
- Click Finish to deploy the VM.
Right-click the VM and select Template > Convert to Template.
Repeat this whole process for photon-3-haproxy-v1.2.4+vmware.1.ova
Upgrade the Management Cluster
There is no UI-based upgrade for TKG (yet) so you will be performing the rest of the steps from the command line. I have an Ubuntu 20.04 VM running in my lab for most of my CLI work but this could just as easily be done in PowerShell.
Run the tkg get management-cluster
command to update the .tkg/config.yaml
file as well as the files under the .tkg/bom
and .tkg/providers
folders. You should see output similar to the following:
It seems that you have an outdated providers on your filesystem, proceeding on this command will cause your tkgconfig file to be overridden. You may backup your tkgconfig file before moving on.
Do you want to continue?: y█
the old providers folder /home/ubuntu/.tkg/providers is backed up to /home/ubuntu/.tkg/providers-20200511130650
MANAGEMENT-CLUSTER-NAME CONTEXT-NAME
tkg-mgmt-vsphere-20200416205833 * tkg-mgmt-vsphere-20200416205833-admin@tkg-mgmt-vsphere-20200416205833
Edit the .tkg/config.yaml
file and set the following:
VSPHERE_TEMPLATE: photon-3-kube-v1.18.2+vmware.1
VSPHERE_HAPROXY_TEMPLATE: photon-3-haproxy-v1.2.4+vmware.1
And that’s the prep-work done! Now you can run the actual upgrade on the management cluster:
tkg upgrade mc tkg-mgmt-vsphere-20200416205833
While the upgrade is running, you’ll see the photon and ha-proxy VMs getting cloned to create your new nodes:
You’ll also likely notice that there is an extra node in your cluster. Since this is a rolling upgrade, this is expected behavior:

For a short time you will see extra nodes in your cluster via kubectl get nodes
(my cluster should only have one master):
kubectl get nodes
NAME STATUS ROLES AGE VERSION
tkg-mgmt-vsphere-20200416205833-control-plane-rg8h5 Ready master 24d v1.17.3+vmware.2
tkg-mgmt-vsphere-20200416205833-control-plane-tk8q2 Ready master 31s v1.18.1+vmware.1
tkg-mgmt-vsphere-20200416205833-md-0-5c8787b78f-4k4xs Ready <none> 24d v1.17.3+vmware.2
Once the new node is fully functional, you’ll see the old node get decommissioned:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
tkg-mgmt-vsphere-20200416205833-control-plane-rg8h5 Ready,SchedulingDisabled master 24d v1.17.3+vmware.2
tkg-mgmt-vsphere-20200416205833-control-plane-tk8q2 Ready master 3m11s v1.18.1+vmware.1
tkg-mgmt-vsphere-20200416205833-md-0-5c8787b78f-4k4xs Ready <none> 24d v1.17.3+vmware.2
kubectl get nodes
NAME STATUS ROLES AGE VERSION
tkg-mgmt-vsphere-20200416205833-control-plane-tk8q2 Ready master 4m14s v1.18.1+vmware.1
tkg-mgmt-vsphere-20200416205833-md-0-5c8787b78f-4k4xs Ready <none> 24d v1.17.3+vmware.2
And you will see the original nodes get deleted in the vSphere UI:
You’ll see this same process repeat for all of the nodes in the cluster (control plane, workers, load balancer).
You can follow some high-level logging at the command line while the upgrade is happening as well:
tkg upgrade mc tkg-mgmt-vsphere-20200416205833
Logs of the command execution can also be found at: /tmp/tkg-20200511T130803569135562.log
? Upgrading management cluster 'tkg-mgmt-vsphere-20200416205833' to TKG version 'v1.1.0-rc.1' with Kubernet? Upgrading management cluster 'tkg-mgmt-vsphere-20200416205833' to TKG version 'v1.1.0-rc.1' with KubernetUpgrading management cluster 'tkg-mgmt-vsphere-20200416205833' to TKG version 'v1.1.0-rc.1' with Kubernetes? Upgrading management cluster 'tkg-mgmt-vsphere-20200416205833' to TKG version 'v1.1.0-rc.1' with Kubernetes version 'v1.18.1+vmware.1'. Are you sure?? [y/N] y█
Upgrading management cluster providers...
Performing upgrade...
Performing upgrade...
Upgrading Provider="capi-system/cluster-api" CurrentVersion="" TargetVersion="v0.3.5"
Deleting Provider="cluster-api" Version="" TargetNamespace="capi-system"
Installing Provider="cluster-api" Version="v0.3.5" TargetNamespace="capi-system"
Upgrading Provider="capi-kubeadm-bootstrap-system/bootstrap-kubeadm" CurrentVersion="" TargetVersion="v0.3.5"
Deleting Provider="bootstrap-kubeadm" Version="" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.5" TargetNamespace="capi-kubeadm-bootstrap-system"
Upgrading Provider="capi-kubeadm-control-plane-system/control-plane-kubeadm" CurrentVersion="" TargetVersion="v0.3.5"
Deleting Provider="control-plane-kubeadm" Version="" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.5" TargetNamespace="capi-kubeadm-control-plane-system"
Upgrading Provider="capv-system/infrastructure-vsphere" CurrentVersion="" TargetVersion="v0.6.4"
Deleting Provider="infrastructure-vsphere" Version="" TargetNamespace="capv-system"
Installing Provider="infrastructure-vsphere" Version="v0.6.4" TargetNamespace="capv-system"
Management cluster providers upgraded successfully...
Upgrading management cluster kubernetes version...
Creating management cluster client...
Verifying kubernetes version...
Retrieving configuration for upgrade cluster...
Create InfrastructureTemplate for upgrade...
Upgrading control plane nodes...
Patching KubeadmControlPlane with the kubernetes version v1.18.1+vmware.1...
Waiting for kubernetes version to be updated for control plane nodes
Upgrading worker nodes...
Patching MachineDeployment with the kubernetes version v1.18.1+vmware.1...
Waiting for kubernetes version to be updated for worker nodes...
Management cluster 'tkg-mgmt-vsphere-20200416205833' is being upgraded to TKG version 'v1.1.0-rc.1' with kubernetes version 'v1.18.1+vmware.1'
And when the upgrade is finished you should see just the expected nodes at the new version:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
tkg-mgmt-vsphere-20200416205833-control-plane-tk8q2 Ready master 24m v1.18.1+vmware.1
tkg-mgmt-vsphere-20200416205833-md-0-6ccb94c8b5-khrcd Ready <none> 17m v1.18.1+vmware.1
Upgrade the Workload Cluster
You can upgrade the workload cluster via a similar process:
tkg get cluster
NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES
test default running 1/1 1/1 v1.17.3+vmware.2
The same OVAs that were used to upgrade the management cluster are used for the workload cluster and we’ve already specified the versions we want in the .tkg/config.yaml
file so we can get right to kicking off the upgrade:
tkg upgrade cluster test --kubernetes-version v1.18.1+vmware.1
? Upgrading workload cluster 'test' to kubernetes version 'v1.18.1+vmware.1'. Are you sure?? [y/N] y█
Creating management cluster client...o kubernetes version 'v1.18.1+vmware.1'. Are you sure?
Creating workload cluster client...
Verifying kubernetes version...
Retrieving configuration for upgrade cluster...
Create InfrastructureTemplate for upgrade...
Upgrading control plane nodes...
Patching KubeadmControlPlane with the kubernetes version v1.18.1+vmware.1...
Waiting for kubernetes version to be updated for control plane nodes
Upgrading worker nodes...
Patching MachineDeployment with the kubernetes version v1.18.1+vmware.1...
Waiting for kubernetes version to be updated for worker nodes...
Cluster 'test' successfully upgraded to kubernetes version 'v1.18.1+vmware.1'
You will see the same operations in the vSphere UI and via kubectl
commands as were observed for the management cluster.
When the upgrade is finished, you will see that the version is updated in tkg get cluster
output:
tkg get cluster
NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES
test default running 1/1 1/1 v1.18.1+vmware.1
As a last bit of cleanup, you can remove the KUBERNETES_VERSION
entry from your .tkg/config.yaml
as it is no longer needed.
One very important fact to be aware of is that since all of our IP addressing is via DHCP, your new nodes will have different IP addresses than the old nodes. This poses no problems for any kubernetes components or constructs with the exception of any NodePort services you have configured. These services will obviously be accessed via a different IP address after the upgrade. Anything you have configured that relies on a NodePort service, or anything using a certificate that includes a node IP address as a SAN will have to be reconfigured. With this in mind, it is highly recommended to use LoadBalancer services or configure DNS records and use FQDNs in your certificates and for accessing kubernetes services. I’ll have a post up soon going over how to configure MetalLB in a TKG cluster for providing LoadBalancer services.
Pingback: Upgrading from TKG 1.1 to 1.2 – Little Stuff