How to Upgrade a Tanzu Kubernetes Grid cluster from 1.0 to 1.1

Much of this is based off of the instructions noted at Upgrading Tanzu Kubernetes Grid.

TKG 1.1 provides the first opportunity to perform an upgrade of the product. The following are my notes from performing a TKG upgrade.

The first step will be to download the components needed to install TKG 1.1 from https://my.vmware.com/web/vmware/details?downloadGroup=TKG-110&productId=988&rPId=46507. This will include the CLI, the Kubernetes v1.18.2 OVA, the Load Balancer OVA and the extension manifests. The size of all of these items is only a few GB so it should be done relatively quickly. The TKG CLI executables can be renamed to just tkg and tkg.exe to make them easier to use.

Create vSphere Resources

Since there are updates to the HA Proxy VM and the Kubernetes node VMs, we’ll need to deploy new OVA files to be used during the upgrade process and for any new clusters that get created post-upgrade.

In the vSphere Client, create Resource Pools named TKG-Mgmt and TKG-Comp, create vm folders named the same

In the vSphere Client, right-click an object in the vCenter Server inventory, select Deploy OVF template.
Select Local file, click the button to upload files, and navigate to the photon-3-kube-v1.18.2+vmware.1.ova file.
Follow the installer prompts to deploy a VM from the OVA template.

  • Leave VM name as photon-3-kube-v1.18.2+vmware.1
  • Select the compute resource as RegionA01-MGMT
  • Accept the end user license agreements (EULA)
  • Select the map-vol datastore
  • Select the Dswitch-Management network
  • Click Finish to deploy the VM.

Right-click the VM and select Template > Convert to Template.
Repeat this whole process for photon-3-haproxy-v1.2.4+vmware.1.ova

Upgrade the Management Cluster

There is no UI-based upgrade for TKG (yet) so you will be performing the rest of the steps from the command line. I have an Ubuntu 20.04 VM running in my lab for most of my CLI work but this could just as easily be done in PowerShell.

Run the tkg get management-cluster command to update the .tkg/config.yaml file as well as the files under the .tkg/bom and .tkg/providers folders. You should see output similar to the following:

It seems that you have an outdated providers on your filesystem, proceeding on this command will cause your tkgconfig file to be overridden. You may backup your tkgconfig file before moving on.
Do you want to continue?: y█
the old providers folder /home/ubuntu/.tkg/providers is backed up to /home/ubuntu/.tkg/providers-20200511130650
 MANAGEMENT-CLUSTER-NAME            CONTEXT-NAME
 tkg-mgmt-vsphere-20200416205833 *  tkg-mgmt-vsphere-20200416205833-admin@tkg-mgmt-vsphere-20200416205833

Edit the .tkg/config.yaml file and set the following:

VSPHERE_TEMPLATE: photon-3-kube-v1.18.2+vmware.1
VSPHERE_HAPROXY_TEMPLATE: photon-3-haproxy-v1.2.4+vmware.1

And that’s the prep-work done! Now you can run the actual upgrade on the management cluster:

tkg upgrade mc tkg-mgmt-vsphere-20200416205833

While the upgrade is running, you’ll see the photon and ha-proxy VMs getting cloned to create your new nodes:

You’ll also likely notice that there is an extra node in your cluster. Since this is a rolling upgrade, this is expected behavior:

For a short time you will see extra nodes in your cluster via kubectl get nodes (my cluster should only  have one master):

kubectl get nodes
NAME                                                    STATUS   ROLES    AGE   VERSION
tkg-mgmt-vsphere-20200416205833-control-plane-rg8h5     Ready    master   24d   v1.17.3+vmware.2
tkg-mgmt-vsphere-20200416205833-control-plane-tk8q2     Ready    master   31s   v1.18.1+vmware.1
tkg-mgmt-vsphere-20200416205833-md-0-5c8787b78f-4k4xs   Ready    <none>   24d   v1.17.3+vmware.2

Once the new node is fully functional, you’ll see the old node get decommissioned:

kubectl get nodes
NAME                                                    STATUS                     ROLES    AGE     VERSION
tkg-mgmt-vsphere-20200416205833-control-plane-rg8h5     Ready,SchedulingDisabled   master   24d     v1.17.3+vmware.2
tkg-mgmt-vsphere-20200416205833-control-plane-tk8q2     Ready                      master   3m11s   v1.18.1+vmware.1
tkg-mgmt-vsphere-20200416205833-md-0-5c8787b78f-4k4xs   Ready                      <none>   24d     v1.17.3+vmware.2
 
kubectl get nodes
NAME                                                    STATUS   ROLES    AGE     VERSION
tkg-mgmt-vsphere-20200416205833-control-plane-tk8q2     Ready    master   4m14s   v1.18.1+vmware.1
tkg-mgmt-vsphere-20200416205833-md-0-5c8787b78f-4k4xs   Ready    <none>   24d     v1.17.3+vmware.2

And you will see the original nodes get deleted in the vSphere UI:

You’ll see this same process repeat for all of the nodes in the cluster (control plane, workers, load balancer).

You can follow some high-level logging at the command line while the upgrade is happening as well:

tkg upgrade mc tkg-mgmt-vsphere-20200416205833
Logs of the command execution can also be found at: /tmp/tkg-20200511T130803569135562.log
? Upgrading management cluster 'tkg-mgmt-vsphere-20200416205833' to TKG version 'v1.1.0-rc.1' with Kubernet? Upgrading management cluster 'tkg-mgmt-vsphere-20200416205833' to TKG version 'v1.1.0-rc.1' with KubernetUpgrading management cluster 'tkg-mgmt-vsphere-20200416205833' to TKG version 'v1.1.0-rc.1' with Kubernetes? Upgrading management cluster 'tkg-mgmt-vsphere-20200416205833' to TKG version 'v1.1.0-rc.1' with Kubernetes version 'v1.18.1+vmware.1'. Are you sure?? [y/N] y█
Upgrading management cluster providers...
Performing upgrade...
Performing upgrade...
Upgrading Provider="capi-system/cluster-api" CurrentVersion="" TargetVersion="v0.3.5"
Deleting Provider="cluster-api" Version="" TargetNamespace="capi-system"
Installing Provider="cluster-api" Version="v0.3.5" TargetNamespace="capi-system"
Upgrading Provider="capi-kubeadm-bootstrap-system/bootstrap-kubeadm" CurrentVersion="" TargetVersion="v0.3.5"
Deleting Provider="bootstrap-kubeadm" Version="" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.5" TargetNamespace="capi-kubeadm-bootstrap-system"
Upgrading Provider="capi-kubeadm-control-plane-system/control-plane-kubeadm" CurrentVersion="" TargetVersion="v0.3.5"
Deleting Provider="control-plane-kubeadm" Version="" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.5" TargetNamespace="capi-kubeadm-control-plane-system"
Upgrading Provider="capv-system/infrastructure-vsphere" CurrentVersion="" TargetVersion="v0.6.4"
Deleting Provider="infrastructure-vsphere" Version="" TargetNamespace="capv-system"
Installing Provider="infrastructure-vsphere" Version="v0.6.4" TargetNamespace="capv-system"
Management cluster providers upgraded successfully...
Upgrading management cluster kubernetes version...
Creating management cluster client...
Verifying kubernetes version...
Retrieving configuration for upgrade cluster...
Create InfrastructureTemplate for upgrade...
Upgrading control plane nodes...
Patching KubeadmControlPlane with the kubernetes version v1.18.1+vmware.1...
Waiting for kubernetes version to be updated for control plane nodes
Upgrading worker nodes...
Patching MachineDeployment with the kubernetes version v1.18.1+vmware.1...
Waiting for kubernetes version to be updated for worker nodes...
Management cluster 'tkg-mgmt-vsphere-20200416205833' is being upgraded to TKG version 'v1.1.0-rc.1' with kubernetes version 'v1.18.1+vmware.1'

And when the upgrade is finished you should see just the expected nodes at the new version:

kubectl get nodes
NAME                                                    STATUS   ROLES    AGE   VERSION
tkg-mgmt-vsphere-20200416205833-control-plane-tk8q2     Ready    master   24m   v1.18.1+vmware.1
tkg-mgmt-vsphere-20200416205833-md-0-6ccb94c8b5-khrcd   Ready    <none>   17m   v1.18.1+vmware.1

Upgrade the Workload Cluster

You can upgrade the workload cluster via a similar process:

tkg get cluster
 NAME  NAMESPACE  STATUS   CONTROLPLANE  WORKERS  KUBERNETES
 test  default    running  1/1           1/1      v1.17.3+vmware.2

The same OVAs that were used to upgrade the management cluster are used for the workload cluster and we’ve already specified the versions we want in the .tkg/config.yaml file so we can get right to kicking off the upgrade:

tkg upgrade cluster test --kubernetes-version v1.18.1+vmware.1
? Upgrading workload cluster 'test' to kubernetes version 'v1.18.1+vmware.1'. Are you sure?? [y/N] y█
Creating management cluster client...o kubernetes version 'v1.18.1+vmware.1'. Are you sure?
Creating workload cluster client...
Verifying kubernetes version...
Retrieving configuration for upgrade cluster...
Create InfrastructureTemplate for upgrade...
Upgrading control plane nodes...
Patching KubeadmControlPlane with the kubernetes version v1.18.1+vmware.1...
Waiting for kubernetes version to be updated for control plane nodes
Upgrading worker nodes...
Patching MachineDeployment with the kubernetes version v1.18.1+vmware.1...
Waiting for kubernetes version to be updated for worker nodes...
Cluster 'test' successfully upgraded to kubernetes version 'v1.18.1+vmware.1'

You will see the same operations in the vSphere UI and via kubectl commands as were observed for the management cluster.

When the upgrade is finished, you will see that the version is updated in tkg get cluster output:

tkg get cluster
 NAME  NAMESPACE  STATUS   CONTROLPLANE  WORKERS  KUBERNETES
 test  default    running  1/1           1/1      v1.18.1+vmware.1

As a last bit of cleanup, you can remove the KUBERNETES_VERSION entry from your .tkg/config.yaml as it is no longer needed.

One very important fact to be aware of is that since all of our IP addressing is via DHCP, your new nodes will have different IP addresses than the old nodes. This poses no problems for any kubernetes components or constructs with the exception of any NodePort services you have configured. These services will obviously be accessed via a different IP address after the upgrade. Anything you have configured that relies on a NodePort service, or anything using a certificate that includes a node IP address as a SAN will have to be reconfigured. With this in mind, it is highly recommended to use LoadBalancer services or configure DNS records and use FQDNs in your certificates and for accessing kubernetes services. I’ll have a post up soon going over how to configure MetalLB in a TKG cluster for providing LoadBalancer services.

Leave a Comment

Your email address will not be published.