How to migrate from VCP to CSI in TKGI 1.12

In previous versions of TKGI, access to vSphere storage was accomplished via the vSphere Cloud Provider. While still functional, this method uses the legacy in-tree driver and is not preferable to the newer vSphere CNS/CSI means of accessing vSphere storage. TKGI has allowed for use of the newer CSI driver for some time but it was a manual installation process in already-provisioned Kubernetes clusters. As of TKG 1.11, you can configure the default vSphere storage provider to be CSI on the TKGI tile (see Deploying and Managing Cloud Native Storage (CNS) on vSphere for more details). However, there was no means of migrating from the legacy VCP driver to the CSI driver…until TKG 1.12.

In my recent post, Upgrading a TKG 1.11 management console installation to 1.12, I had a Linux cluster deployed that had a stateful WordPress application configured (using VCP for vSphere storage). You can see some of the details of this below:

kubectl get sc,pvc,pv

NAME                                     PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
storageclass.storage.k8s.io/k8s-policy   kubernetes.io/vsphere-volume   Delete          Immediate           false                  61s

NAME                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/mysql-pv-claim   Bound    pvc-4379b1a6-4664-4241-81d8-deeac989dd4e   20Gi       RWO            k8s-policy     61s
persistentvolumeclaim/wp-pv-claim      Bound    pvc-495943b5-9c91-4db8-88de-2e021fa597b6   20Gi       RWO            k8s-policy     61s

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                    STORAGECLASS   REASON   AGE
persistentvolume/pvc-4379b1a6-4664-4241-81d8-deeac989dd4e   20Gi       RWO            Delete           Bound    default/mysql-pv-claim   k8s-policy              60s
persistentvolume/pvc-495943b5-9c91-4db8-88de-2e021fa597b6   20Gi       RWO            Delete           Bound    default/wp-pv-claim      k8s-policy              60s

You can see that the provisioner for the storage class is kubernetes.io/vsphere-volume, which correlates to the VCP driver. If this storage class were using the newer CSI driver, the provisioner would be csi.vsphere.vmware.com.

There are two main things to keep in mind if you are considering migrating from VCP to CSI in your TKGI 1.12 installation:

  • The migration is only supported for Linux clusters.
  • Your vSphere installation must be at least version 7.0U1

There are several other possible considerations to be aware of documented at Things to consider before turning on Migration.

Before getting started, I decided to take a closer look at some of the storage objects, both from a Kubernetes standpoint and from vSphere.

There is nothing too special about any of the objects on the Kubernetes side of things:

kubectl describe sc k8s-policy

Name:                  k8s-policy
IsDefaultClass:        No
Annotations:           <none>
Provisioner:           kubernetes.io/vsphere-volume
Parameters:            <none>
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>
kubectl describe pvc wp-pv-claim

Name:          wp-pv-claim
Namespace:     default
StorageClass:  k8s-policy
Status:        Bound
Volume:        pvc-495943b5-9c91-4db8-88de-2e021fa597b6
Labels:        app=wordpress
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/vsphere-volume
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      20Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       wordpress-6dcc67df85-6fcxt
Events:
  Type    Reason                 Age   From                         Message
  ----    ------                 ----  ----                         -------
  Normal  ProvisioningSucceeded  95s   persistentvolume-controller  Successfully provisioned volume pvc-495943b5-9c91-4db8-88de-2e021fa597b6 using kubernetes.io/vsphere-volume
kubectl describe pv pvc-495943b5-9c91-4db8-88de-2e021fa597b6

Name:            pvc-495943b5-9c91-4db8-88de-2e021fa597b6
Labels:          <none>
Annotations:     kubernetes.io/createdby: vsphere-volume-dynamic-provisioner
                 pv.kubernetes.io/bound-by-controller: yes
                 pv.kubernetes.io/provisioned-by: kubernetes.io/vsphere-volume
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    k8s-policy
Status:          Bound
Claim:           default/wp-pv-claim
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        20Gi
Node Affinity:   <none>
Message:
Source:
    Type:               vSphereVolume (a Persistent Disk resource in vSphere)
    VolumePath:         [map-vol] kubevols/linux-cluster-dynamic-pvc-495943b5-9c91-4db8-88de-2e021fa597b6.vmdk
    FSType:             ext4
    StoragePolicyName:
Events:                 <none>

kubectl get ValidatingWebhookConfiguration -A
NAME                  WEBHOOKS   AGE
validator.pksapi.io   2          11m

On the vSphere side, there are no container volumes present since the CSI driver is not in use. You can see this by navigating to your cluster and then selecting Monitor, Cloud Native Storage, Container Volumes:

There are definitely virtual disks associated with the noted persistent volumes though ([map-vol] kubevols/linux-cluster-dynamic-pvc-495943b5-9c91-4db8-88de-2e021fa597b6.vmdk was noted when describing a PV). If you navigate to your datastore where these volumes should be created and then click on the Files tab, you will see a folder named kubevols that should have the backing virtual disks in it.

The file names are truncated in this screenshot but you can see the first part of the name matches up with the PVCs noted earlier (pvc-4379b1a6-4664-4241-81d8-deeac989dd4e and pvc-495943b5-9c91-4db8-88de-2e021fa597b6).

Another relevant folder here is the fcd folder. fcd stands for first class disks and these are what get created when the CSI driver is in use. As you would expect, there are no fcd disks created:

On to the migration…

The first thing to do will be to enable the CSI driver integration for TKGI. Since I am using the TKGI Management Console (TKGIMC), I will be doing this on the TKGI Configuration page in that UI:

If you’re not using the TKGIMC, you can make the same change in Opsman on the TKGI tile under Storage:

After making the needed change and waiting for the update to complete, you can move on to actually updating your cluster(s).

If you happen to be following the progress of the update in the Opsman UI or via bosh task, you will see the following which is basically the entire change:

- name: pivotal-container-service
   properties:
     service_catalog:
       global_properties:
-         vsphere_csi_enabled: false
+         vsphere_csi_enabled: true

Before I decided to migrate my cluster, I wanted to see if just updating the TKGI configuration was enough to be able to provision CSI volumes in a pre-existing cluster. I created a new storage class using CSI and a PVC using the new storage class. When I checked out the PVC, it was in a Pending state and digging deeper showed the following:

kubectl describe pvc mysql-pv-claim-2
Name:          mysql-pv-claim-2
Namespace:     default
StorageClass:  k8s-policy-2
Status:        Pending
Volume:
Labels:        app=wordpress
Annotations:   volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       wordpress-mysql-2-78bd8c99b5-qm64w
Events:
  Type     Reason                Age                From                         Message
  ----     ------                ----               ----                         -------
  Warning  ProvisioningFailed    46s (x2 over 57s)  persistentvolume-controller  storageclass.storage.k8s.io "k8s-policy-2" not found
  Normal   ExternalProvisioning  1s (x3 over 31s)   persistentvolume-controller  waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator

CSI was still not functional in this cluster, so on to the migration.

As noted, I have a single Linux cluster using the older VCP driver:

tkgi cluster linux-cluster

PKS Version:              1.12.0-build.42
Name:                     linux-cluster
K8s Version:              1.21.3
Plan Name:                linux-small
UUID:                     2ed3bec8-1710-48f4-9917-19599e6cdcd0
Last Action:              UPGRADE
Last Action State:        succeeded
Last Action Description:  Instance upgrade completed
Kubernetes Master Host:   linux-cluster.corp.tanzu
Kubernetes Master Port:   8443
Worker Nodes:             1
Kubernetes Master IP(s):  10.40.14.42
Network Profile Name:
Kubernetes Profile Name:
Compute Profile Name:
Tags:

You need to create a configuration file with the following content that will be used during a cluster update operation:

{
    "enable_csi_migration": "true"
}

With this file created, you can run a command similar to the following to update the cluster:

tkgi update-cluster linux-cluster --config-file csi-config.json

Update summary for cluster linux-cluster:
Cluster Configuration File Path: csi-config.json
Are you sure you want to continue? (y/n): y
Use 'pks cluster linux-cluster' to monitor the state of your cluster

The tkgi cluster command will show that the cluster is updating but doesn’t give a lot of detail:

tkgi cluster linux-cluster

PKS Version:              1.12.0-build.42
Name:                     linux-cluster
K8s Version:              1.21.3
Plan Name:                linux-small
UUID:                     2ed3bec8-1710-48f4-9917-19599e6cdcd0
Last Action:              UPDATE
Last Action State:        in progress
Last Action Description:  Instance update in progress
Kubernetes Master Host:   linux-cluster.corp.tanzu
Kubernetes Master Port:   8443
Worker Nodes:             1
Kubernetes Master IP(s):  10.40.14.42
Network Profile Name:
Kubernetes Profile Name:
Compute Profile Name:
Tags:

However, bosh task output is a little more helpful:

bosh task

Using environment '172.31.0.3' as client 'ops_manager'

Task 477

Task 477 | 18:02:51 | Deprecation: Global 'properties' are deprecated. Please define 'properties' at the job level.
Task 477 | 18:02:52 | Preparing deployment: Preparing deployment
Task 477 | 18:02:53 | Warning: DNS address not available for the link provider instance: pivotal-container-service/fc1121b1-accc-48f0-9f30-ef72aec6959a
Task 477 | 18:02:53 | Warning: DNS address not available for the link provider instance: pivotal-container-service/fc1121b1-accc-48f0-9f30-ef72aec6959a
Task 477 | 18:02:54 | Warning: DNS address not available for the link provider instance: pivotal-container-service/fc1121b1-accc-48f0-9f30-ef72aec6959a
Task 477 | 18:03:06 | Preparing deployment: Preparing deployment (00:00:14)
Task 477 | 18:03:06 | Preparing deployment: Rendering templates (00:00:06)
Task 477 | 18:03:13 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 477 | 18:03:13 | Updating instance master: master/c899fd74-5060-461f-bec1-05de2b29566e (0) (canary)
Task 477 | 18:03:16 | L executing pre-stop: master/c899fd74-5060-461f-bec1-05de2b29566e (0) (canary)
Task 477 | 18:03:16 | L executing drain: master/c899fd74-5060-461f-bec1-05de2b29566e (0) (canary)
Task 477 | 18:03:18 | L stopping jobs: master/c899fd74-5060-461f-bec1-05de2b29566e (0) (canary)
Task 477 | 18:03:19 | L executing post-stop: master/c899fd74-5060-461f-bec1-05de2b29566e (0) (canary)
Task 477 | 18:03:35 | L installing packages: master/c899fd74-5060-461f-bec1-05de2b29566e (0) (canary)
Task 477 | 18:03:39 | L configuring jobs: master/c899fd74-5060-461f-bec1-05de2b29566e (0) (canary)
Task 477 | 18:03:39 | L executing pre-start: master/c899fd74-5060-461f-bec1-05de2b29566e (0) (canary)
Task 477 | 18:04:19 | L starting jobs: master/c899fd74-5060-461f-bec1-05de2b29566e (0) (canary)
Task 477 | 18:05:14 | L executing post-start: master/c899fd74-5060-461f-bec1-05de2b29566e (0) (canary) (00:02:07)
Task 477 | 18:05:20 | Updating instance worker: worker/745b1ef9-c38b-4da6-9dee-6a8a25c021e2 (0) (canary)
Task 477 | 18:05:29 | L executing pre-stop: worker/745b1ef9-c38b-4da6-9dee-6a8a25c021e2 (0) (canary)

You can also see that the cluster is being reconfigured in the TKGIMC UI:

When the process gets to the point of updating the worker (where the WordPress pods are running), you will see a lot of activity similar to the following:

I was a little confused at first as to what was happening here as it looked like it was creating new disks (Create Container Volume). This was actually just updating the existing virtual disks (to first class disks) such that they will be recognized as container volumes. When I went to look at the disk files in the kubevols folder again, I saw that there were extra metadata files present now:

And looking into one of these files shows the following:

0
(cns.containerCluster.clusterDistributionTKGI-
"cns.containerCluster.clusterFlavorVANILLAW
cns.containerCluster.clusterId5service-instance_4e018a05-5170-401a-9a6c-ec03659d55e1.
 cns.containerCluster.clusterType
KUBERNETES?
 cns.containerCluster.vSphereUserdministrator@vsphere.localN
cns.k8s.pod.clusterId5service-instance_4e018a05-5170-401a-9a6c-ec03659d55e14
cns.k8s.pod.name wordpress-mysql-8545566997-fqlsx
cns.k8s.pod.namespacedefault]
$cns.k8s.pod.referredEntity.clusterId5service-instance_4e018a05-5170-401a-9a6c-ec03659d55e1/
$cns.k8s.pod.referredEntity.namespacedefault5
#cns.k8s.pod.referredEntity.pvc.namemysql-pv-claimM
cns.k8s.pv.clusterId5service-instance_4e018a05-5170-401a-9a6c-ec03659d55e1;
cns.k8s.pv.name(pvc-4379b1a6-4664-4241-81d8-deeac989dd4eN
cns.k8s.pvc.clusterId5service-instance_4e018a05-5170-401a-9a6c-ec03659d55e1"
cns.k8s.pvc.namemysql-pv-claim
cns.k8s.pvc.namespacedefault]
$cns.k8s.pvc.referredEntity.clusterId5service-instance_4e018a05-5170-401a-9a6c-ec03659d55e1N
"cns.k8s.pvc.referredEntity.pv.name(pvc-4379b1a6-4664-4241-81d8-deeac989dd4e
cns.tagtrue

It’s all CNS related.

And if you navigate to your cluster and then to Monitoring, Cloud Native Storage, Container Volumes, you’ll see the newly created volumes:

You might notice the labels on these volumes as they are directly related to the WordPress app.

When the migration is done, you should see the status reflected in the tkgi cluster output:

tkgi cluster linux-cluster

PKS Version:              1.12.0-build.42
Name:                     linux-cluster
K8s Version:              1.21.3
Plan Name:                linux-small
UUID:                     2ed3bec8-1710-48f4-9917-19599e6cdcd0
Last Action:              UPDATE
Last Action State:        succeeded
Last Action Description:  Instance update completed
Kubernetes Master Host:   linux-cluster.corp.tanzu
Kubernetes Master Port:   8443
Worker Nodes:             1
Kubernetes Master IP(s):  10.40.14.42
Network Profile Name:
Kubernetes Profile Name:
Compute Profile Name:
Tags:

I decided to take a look at the same storage objects after the migration to see what they looked like:

kubectl get sc,pvc,pv

NAME                                     PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
storageclass.storage.k8s.io/k8s-policy   kubernetes.io/vsphere-volume   Delete          Immediate           false                  75m

NAME                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/mysql-pv-claim   Bound    pvc-4379b1a6-4664-4241-81d8-deeac989dd4e   20Gi       RWO            k8s-policy     75m
persistentvolumeclaim/wp-pv-claim      Bound    pvc-495943b5-9c91-4db8-88de-2e021fa597b6   20Gi       RWO            k8s-policy     75m

NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                    STORAGECLASS   REASON   AGE
persistentvolume/pvc-4379b1a6-4664-4241-81d8-deeac989dd4e   20Gi       RWO            Delete           Bound    default/mysql-pv-claim   k8s-policy              75m
persistentvolume/pvc-495943b5-9c91-4db8-88de-2e021fa597b6   20Gi       RWO            Delete           Bound    default/wp-pv-claim      k8s-policy              75m

These all look the same and their creation time is before the migration started.


kubectl describe pvc wp-pv-claim

Name:          wp-pv-claim
Namespace:     default
StorageClass:  k8s-policy
Status:        Bound
Volume:        pvc-495943b5-9c91-4db8-88de-2e021fa597b6
Labels:        app=wordpress
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               pv.kubernetes.io/migrated-to: csi.vsphere.vmware.com
               volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/vsphere-volume
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      20Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       wordpress-6dcc67df85-7b9kc
Events:
  Type    Reason                 Age   From                         Message
  ----    ------                 ----  ----                         -------
  Normal  ProvisioningSucceeded  77m   persistentvolume-controller  Successfully provisioned volume pvc-495943b5-9c91-4db8-88de-2e021fa597b6 using kubernetes.io/vsphere-volume
kubectl describe pv pvc-495943b5-9c91-4db8-88de-2e021fa597b6

Name:            pvc-495943b5-9c91-4db8-88de-2e021fa597b6
Labels:          <none>
Annotations:     kubernetes.io/createdby: vsphere-volume-dynamic-provisioner
                 pv.kubernetes.io/bound-by-controller: yes
                 pv.kubernetes.io/migrated-to: csi.vsphere.vmware.com
                 pv.kubernetes.io/provisioned-by: kubernetes.io/vsphere-volume
Finalizers:      [kubernetes.io/pv-protection external-attacher/csi-vsphere-vmware-com]
StorageClass:    k8s-policy
Status:          Bound
Claim:           default/wp-pv-claim
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        20Gi
Node Affinity:   <none>
Message:
Source:
    Type:               vSphereVolume (a Persistent Disk resource in vSphere)
    VolumePath:         [map-vol] kubevols/linux-cluster-dynamic-pvc-495943b5-9c91-4db8-88de-2e021fa597b6.vmdk
    FSType:             ext4
    StoragePolicyName:
Events:                 <none>

There is an extra annotation now, pv.kubernetes.io/migrated-to: csi.vsphere.vmware.com. You can also see that there is a new finalizer on the persistent volume, [kubernetes.io/pv-protection external-attacher/csi-vsphere-vmware-com], which is clearly indicating that CSI is now managing this it.

I decided to test out creating a new PVC with my existing VPC storage class.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
  labels:
    type: vcp
spec:
  storageClassName: k8s-policy
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
kubectl apply -f test-volume-vcp.yaml
kubectl describe pvc test-pvc

Name:          test-pvc
Namespace:     default
StorageClass:  k8s-policy
Status:        Bound
Volume:        pvc-172172ff-93eb-42ab-a700-158cfa605760
Labels:        type=vcp
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      5Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type    Reason                 Age   From                                                                                              Message
  ----    ------                 ----  ----                                                                                              -------
  Normal  Provisioning           15s   csi.vsphere.vmware.com_4263f11b-0f13-4e89-8ab5-27a6cb7b176f_c534500d-8be3-4b1b-b8fe-60e03a21f686  External provisioner is provisioning volume for claim "default/test-pvc"
  Normal  ExternalProvisioning   15s   persistentvolume-controller                                                                       waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
  Normal  ProvisioningSucceeded  14s   csi.vsphere.vmware.com_4263f11b-0f13-4e89-8ab5-27a6cb7b176f_c534500d-8be3-4b1b-b8fe-60e03a21f686  Successfully provisioned volume pvc-172172ff-93eb-42ab-a700-158cfa605760

You can see that there is an annotation on this PVC indication that CSI is involved, even though the backing storage class is using VCP: volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com. There is an in-tree driver plugin that is converting these VCP requests to CSI requests.

With in mind, if a storage class is still using VCP as as the provisioner, it will be ignored and CSI will be used instead.

Another thing to note is that virtual disk files created under CSI are now placed in the fcd folder on the datastore:

Any migrated virtual disks will remain in the kubevols folder.

Going back to my earlier test of creating a CSI volume using a CSI storage class, this works as expected:

kubectl describe pvc mysql-pv-claim-2
Name:          mysql-pv-claim-2
Namespace:     default
StorageClass:  k8s-policy-2
Status:        Pending
Volume:
Labels:        app=wordpress
Annotations:   volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       wordpress-mysql-2-78bd8c99b5-qm64w
Events:
  Type     Reason                Age                From                         Message
  ----     ------                ----               ----                         -------
  Normal  ExternalProvisioning   6m9s (x7 over 7m27s)  persistentvolume-controller                                                                       waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
  Normal  ExternalProvisioning   48s (x2 over 63s)     persistentvolume-controller                                                                       waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator
  Normal  Provisioning           47s                   csi.vsphere.vmware.com_86185e3f-3442-4e18-afa0-d01ac7c559e6_0a79fccc-f0c6-4306-9fba-dd74e702f0ad  External provisioner is provisioning volume for claim "default/mysql-pv-claim-2"
  Normal  ProvisioningSucceeded  46s                   csi.vsphere.vmware.com_86185e3f-3442-4e18-afa0-d01ac7c559e6_0a79fccc-f0c6-4306-9fba-dd74e702f0ad  Successfully provisioned volume pvc-6346d995-cbca-4222-b7dd-87b2a09c8bda

You should be aware that clusters provisioned after the vSphere CSI Integration has been enabled will not need to have the migration step performed…they are ready to created CSI-backed volumes.

Leave a Comment

Your email address will not be published. Required fields are marked *