How to migrate a vSphere 8 with Tanzu TKG cluster from one datastore to another

If you ever have a need to move a deployed TKG cluster from one datastore to another, the process is not too difficult. The process results in new nodes being created (realized as new virtual machines on the new datastore). If you have a single worker node, or single-pod deployments, this will be a disruptive exercise.

When you create a TKG cluster, you start out with a supervisor cluster namespace with a specific storage policy in use. In my small lab, I have a single storage policy named k8s-policy that is associated with an NFS datastore named vol1.

If you look at the configuration of the supervisor cluster namespace where my cluster is deployed, you can see on the Storage pane that the k8s-policy storage policy is associated.

Examining the supervisor cluster namespace and the TKG cluster shows a storage class has been created that corresponds to the k8s-policy storage policy. You can also see that the virtualmachine objects created in the supervisor cluster namespace are using the k8s-policy storage class.

kubectl config use-context tkg2-cluster-namespace
Switched to context "tkg2-cluster-namespace".

kubectl get sc
NAME         PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
k8s-policy   csi.vsphere.vmware.com   Delete          Immediate           true                   93d

kubectl get virtualmachine
NAME                                                              POWER-STATE   AGE
tkg2-cluster-1-qztgr-vs6dt                                        poweredOn     93d
tkg2-cluster-1-tkg2-cluster-1-nodepool-1-ttkg5-7c546557c6-7cmw4   poweredOn     93d
tkg2-cluster-1-tkg2-cluster-1-nodepool-1-ttkg5-7c546557c6-wz5z8   poweredOn     93d

kubectl get virtualmachine tkg2-cluster-1-tkg2-cluster-1-nodepool-1-ttkg5-7c546557c6-7cmw4 -o jsonpath='{.spec.storageClass}'

k8s-policy
kubectl config use-context tkg2-cluster-1
Switched to context "tkg2-cluster-1".

kubectl get sc
NAME                     PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
k8s-policy               csi.vsphere.vmware.com   Delete          Immediate              true                   93d
k8s-policy-latebinding   csi.vsphere.vmware.com   Delete          WaitForFirstConsumer   true                   93d

In the vSphere Client, you can see that the three TKSG node VMs are on the vol1 datastore.

I have created a second NFS datastore named vol2 and configured a tag named new-storage on it.

I have also created a new storage policy named new-policy that is associated with the vol2 datastore based on the new-storage tag being present.

The last step to enabling the vol2 datastore to be used for my TKG cluster is to associate it with the supervisor cluster namespace.

On the Summary page for the supervisor cluster namespace, there is only the k8s-policy storage policy associated.

Click the Edit Storage link.

You can see that the new-policy storage policy is available. Click the box next to it to select it.

With both storage policies selected, click teh OK button.

The Storage pane now shows both storage polices.

Back at the command line, you can see that the supervisor cluster namespace and the TKG cluster both have new storage classes available.

kubectl config use-context tkg2-cluster-namespace
Switched to context "tkg2-cluster-namespace".

kubectl get sc
NAME         PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
k8s-policy   csi.vsphere.vmware.com   Delete          Immediate           true                   93d
new-policy   csi.vsphere.vmware.com   Delete          Immediate           true                   40s
kubectl config use-context tkg2-cluster-1
Switched to context "tkg2-cluster-1".

kubectl get sc
NAME                     PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
k8s-policy               csi.vsphere.vmware.com   Delete          Immediate              true                   93d
k8s-policy-latebinding   csi.vsphere.vmware.com   Delete          WaitForFirstConsumer   true                   93d
new-policy               csi.vsphere.vmware.com   Delete          Immediate              true                   80s
new-policy-latebinding   csi.vsphere.vmware.com   Delete          WaitForFirstConsumer   true                   80s

Now the cluster spec for the TKG cluster can be updated to use the new storage class.

The following is the original cluster spec for the TKG cluster:

apiVersion: run.tanzu.vmware.com/v1alpha3
kind: TanzuKubernetesCluster
metadata:
  name: tkg2-cluster-1
  namespace: tkg2-cluster-namespace
spec:
  topology:
    controlPlane:
      replicas: 1
      vmClass: best-effort-small
      storageClass: k8s-policy
      tkr:
        reference:
          name: v1.24.9---vmware.1-tkg.4
    nodePools:
    - name: tkg2-cluster-1-nodepool-1
      replicas: 2
      vmClass: best-effort-medium
      storageClass: k8s-policy

The only changes needed in the spec are to update k8s-policy to new-policy. The updated spec looks like the following:

apiVersion: run.tanzu.vmware.com/v1alpha3
kind: TanzuKubernetesCluster
metadata:
  name: tkg2-cluster-1
  namespace: tkg2-cluster-namespace
spec:
  topology:
    controlPlane:
      replicas: 1
      vmClass: best-effort-small
      storageClass: new-policy
      tkr:
        reference:
          name: v1.24.9---vmware.1-tkg.4
    nodePools:
    - name: tkg2-cluster-1-nodepool-1
      replicas: 2
      vmClass: best-effort-medium
      storageClass: new-policy

To make this same change to the running TKG cluster, apply the spec in the supervisor cluster namespace.

kubectl config use-context tkg2-cluster-namespace
Switched to context "tkg2-cluster-namespace".

kubectl apply -f updated-tkg2-cluster-1.yaml
tanzukubernetescluster.run.tanzu.vmware.com/tkg2-cluster-1 configured

You should immediately see a lot of activity in the vSphere Client as new nodes are being deployed.

And you can see the new nodes being created under the supervisor cluster namespace folder (only two nodes deployed at a time).

At the command line, you can see that the same virtualmachine objects are being created in the supervisor cluster namespace.

kubectl get virtualmachine
NAME                                                              POWER-STATE   AGE
tkg2-cluster-1-qztgr-4qwms                                                      74s
tkg2-cluster-1-qztgr-vs6dt                                        poweredOn     93d
tkg2-cluster-1-tkg2-cluster-1-nodepool-1-ttkg5-679c58d68f-9lpzw                 75s
tkg2-cluster-1-tkg2-cluster-1-nodepool-1-ttkg5-7c546557c6-7cmw4   poweredOn     93d
tkg2-cluster-1-tkg2-cluster-1-nodepool-1-ttkg5-7c546557c6-wz5z8   poweredOn     93d

Examining the new worker node shows that it is using the new-policy storage class.

kubectl get virtualmachine tkg2-cluster-1-tkg2-cluster-1-nodepool-1-ttkg5-679c58d68f-9lpzw -o jsonpath='{.spec.storageClass}'

new-policy

You can also follow the progress of the new nodes being created and the old nodes being destroyed in the events under the supervisor cluster namespace.

kubectl config use-context tkg2-cluster-namespace
Switched to context "tkg2-cluster-namespace".

kubectl get events

4m22s       Normal    TopologyCreate                 cluster/tkg2-cluster-1                                                                        Created "VSphereMachineTemplate/tkg2-cluster-1-control-plane-tww95" as a replacement for "tkg2-cluster-1-control-plane-jsw46" (template rotation)
4m22s       Normal    TopologyUpdate                 cluster/tkg2-cluster-1                                                                        Updated "KubeadmControlPlane/tkg2-cluster-1-qztgr"
4m22s       Normal    TopologyCreate                 cluster/tkg2-cluster-1                                                                        Created "VSphereMachineTemplate/tkg2-cluster-1-tkg2-cluster-1-nodepool-1-infra-8v5hv" as a replacement for "tkg2-cluster-1-tkg2-cluster-1-nodepool-1-infra-jd8dj" (template rotation)
4m20s       Normal    TopologyUpdate                 cluster/tkg2-cluster-1                                                                        Updated "MachineDeployment/tkg2-cluster-1-tkg2-cluster-1-nodepool-1-ttkg5"
4m20s       Normal    PhaseChanged                   tanzukubernetescluster/tkg2-cluster-1                                                         cluster changes from running phase to updating phase
0s          Warning   ControlPlaneUnhealthy          kubeadmcontrolplane/tkg2-cluster-1-qztgr                                                      Waiting for control plane to pass preflight checks to continue reconciliation: [machine tkg2-cluster-1-qztgr-4qwms does not have APIServerPodHealthy condition, machine tkg2-cluster-1-qztgr-4qwms does not have ControllerManagerPodHealthy condition, machine tkg2-cluster-1-qztgr-4qwms does not have SchedulerPodHealthy condition, machine tkg2-cluster-1-qztgr-4qwms does not have EtcdPodHealthy condition, machine tkg2-cluster-1-qztgr-4qwms does not have EtcdMemberHealthy condition]

After some time, you will see that only the three new TKG node VMs are present in the supervisor cluster folder.

You can also see that there are no TKG node VMs present in the vol1 datastore and that they are all now on the vol2 datastore.

One important thing to keep in mind is that any workloads in the TKG cluster with persistent storage on the original datastore will still be on the original datastore.

Leave a Comment

Your email address will not be published. Required fields are marked *