Backup vSphere Persistent Volumes with Velero and the Velero Plugin for vSphere

The 1.1.1 version of the Velero Plugin for vSphere was just released and I had been waiting for it as I needed the ability to backup vSphere persistent volumes on Kubernetes 1.20.x and to use an untrusted certificate with my S3 storage location. Since both of these concerns were addressed in this version, I quickly got it up and running and tested it out.

To use the Velero Plugin for vSphere, you need to have vSphere CNS/CSI deployed in your Kubernetes cluster. TKG, vSphere with Tanzu, and vSphere clusters provisioned by TMC have vSphere CNS/CSI deployed already. You can also configure vSphere CNS/CSI for TKGI via the instructions provided at https://docs.pivotal.io/tkgi/1-10/vsphere-cns.html. For any other flavor of Kubernetes, you’ll want to reference the official documentation at https://vsphere-csi-driver.sigs.k8s.io/.

Installing Velero is fairly easy but you should read up on the process at https://velero.io/docs/v1.6/basic-install/. Since I’m using MinIO for my S3 storage, I largely referenced https://velero.io/docs/v1.6/contributions/minio/.

Documentation on the Velero Plugin for vSphere can be found at https://github.com/vmware-tanzu/velero-plugin-for-vsphere. They still have 1.1.0 listed as the current version on the main page but it’s actually 1.1.1 (as of this writing). You can find the release notes for this version at https://github.com/vmware-tanzu/velero-plugin-for-vsphere/blob/release-1.1/changelogs/CHANGELOG-1.1.md#v111.

You’ll need access to some kind of S3 storage as this is where Velero will be storing the backup and volume snapshots. I’m running TrueNAS in my labs and they have made it incredibly easy to provide S3 storage functionality, as seen in the following screenshot from the Services page of the TrueNAS UI:

I really just had to create a directory and specify the username and password that would be used to access it. I chose to also use a wildcard certificate which will come up again a little bit later. This service gets MinIO up and running on the TrueNAS device. You don’t have to use MinIO but if you do, there are numerous different ways to deploy it (including one supplied by the Velero team).

When I first browsed to my MinIO UI, there was no bucket created. You just need to click the red plus icon at the bottom right to get the option to create one.

Give it a name and you’re on your way.

Now the velero bucket is present in the UI, we can proceed with installing/configuring Velero and the Velero Plugin for vSphere.

We need to create a file with the credentials used to access the S3 storage as this will be referenced when Velero is installed.

echo "[default]
aws_access_key_id = minio
aws_secret_access_key = minio123" > credentials-velero

The Velero Plugin for vSphere will read the vsphere-config-secret secret from the kube-system namespace (created when vSphere CNS/CSI was installed) to determine how it will communicate with vCenter Server. This is needed when taking and deleting snapshots of the virtual disks that back the persistent volumes in the Kubernetes cluster. You can check that this secret is present and has the correct data via the following command:

kubectl -n kube-system get secrets vsphere-config-secret -o 'go-template={{ index .data "csi-vsphere.conf" }}' | base64 -d

[Global]
insecure-flag = true
cluster-id = kube-system/tkg-mgmt

[VirtualCenter "vcsa-01a.corp.tanzu"]
user = "administrator@vsphere.local"
password = "VMware1!"
datacenters = "/RegionA01"

[Network]
public-network = "K8s-Workload"

Prior to the 1.1.1 version of the Velero Plugin for vSphere, there were potential issues with reading the vsphere-config-secret secret if it was not formatted in a very specific way. The secret worked fine for CNS/CSI but the plugin simply couldn’t use it as-is. The two main sticking points were that the insecure-flag = true line had to have true in quotes ("true") and that a line denoting the port that VC was listening on needed to be added. If you’re using an older version of the plugin and run into this concern, you can run the following to update the secret so that it will work for both CNS/CSI and the Velero Plugin for vSphere:

 kubectl -n kube-system get secrets vsphere-config-secret -o 'go-template={{ index .data "csi-vsphere.conf" }}' | base64 -d > csi-vsphere.conf

 sed -i 's/true/"true"/' csi-vsphere.conf
 sed -i '6 i\port = "443"' csi-vsphere.conf

 kubectl -n kube-system create secret generic vsphere-config-secret --from-file=csi-vsphere.conf -o yaml --dry-run | kubectl replace -f - 

With everything ready to go, we can now use the velero install command to create the velero namespace and appropriate resources.

velero install  \
--provider aws \
--bucket velero \
--secret-file credentials-velero \
--cacert /usr/local/share/ca-certificates/ca-controlcenter.crt \
--snapshot-location-config region=minio \
--plugins velero/velero-plugin-for-aws:v1.1.0,vsphereveleroplugin/velero-plugin-for-vsphere:v1.1.1 \
--use-restic \
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=https://truenas.corp.tanzu:9000

A little bit of a breakdown on what is going on with this command:

  • provider aws – This is used for MinIO as it’s compatible and there is no MinIO provider yet.
  • bucket velero – This specifies the velero S3 bucket to store the backups.
  • secret-file credentials-velero – A reference to the MinIO credentials file created earlier.
  • cacert /usr/local/share/ca-certificates/ca-controlcenter.crt – Only needed if your S3 location uses an untrusted certificate (as I mentioned that mine does earlier).
  • snapshot-location-config region=minio – For AWS, a region must be specified. For MinIO, this is not relevant but something must be provided.
  • plugins velero/velero-plugin-for-aws:v1.1.0,vsphereveleroplugin/velero-plugin-for-vsphere:v1.1.1 – Specifying that the Velero Plugin for AWS should be loaded and that the version should be 1.1.0 (the latest) and that the Velero Plugin for vSphere should be loaded and that the version should be 1.1.1 (the latest).
  • use-restic – Restic is an open-source solution that allows you to backup volumes when not using either Amazon EBS Volumes, Azure Managed Disks or Google Persistent Disks (these are the only currently supported block-storage offerings). Since vSphere CSI/CNS is outside of these three, we need to use restic. You can read more about restic integration at https://velero.io/docs/v1.6/restic/#docs.
  • backup-location-config region=minio,s3ForcePathStyle="true",s3Url=https://truenas.corp.tanzu:9000 – This is again specifying the region as minio (as was used for the snapshot-location-config, specifying that velero should use the S3 file path format, and that the S3 bucket is at https://truenas.corp.tanzu:9000.

You can expand the following to see what the output looks like:

velero install output
CustomResourceDefinition/backups.velero.io: attempting to create resource
CustomResourceDefinition/backups.velero.io: attempting to create resource client
CustomResourceDefinition/backups.velero.io: created
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource client
CustomResourceDefinition/backupstoragelocations.velero.io: created
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource client
CustomResourceDefinition/deletebackuprequests.velero.io: created
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource client
CustomResourceDefinition/downloadrequests.velero.io: created
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource client
CustomResourceDefinition/podvolumebackups.velero.io: created
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource client
CustomResourceDefinition/podvolumerestores.velero.io: created
CustomResourceDefinition/resticrepositories.velero.io: attempting to create resource
CustomResourceDefinition/resticrepositories.velero.io: attempting to create resource client
CustomResourceDefinition/resticrepositories.velero.io: created
CustomResourceDefinition/restores.velero.io: attempting to create resource
CustomResourceDefinition/restores.velero.io: attempting to create resource client
CustomResourceDefinition/restores.velero.io: created
CustomResourceDefinition/schedules.velero.io: attempting to create resource
CustomResourceDefinition/schedules.velero.io: attempting to create resource client
CustomResourceDefinition/schedules.velero.io: created
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource client
CustomResourceDefinition/serverstatusrequests.velero.io: created
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource client
CustomResourceDefinition/volumesnapshotlocations.velero.io: created
Waiting for resources to be ready in cluster...
Namespace/velero: attempting to create resource
Namespace/velero: attempting to create resource client
Namespace/velero: created
ClusterRoleBinding/velero: attempting to create resource
ClusterRoleBinding/velero: attempting to create resource client
ClusterRoleBinding/velero: created
ServiceAccount/velero: attempting to create resource
ServiceAccount/velero: attempting to create resource client
ServiceAccount/velero: created
Secret/cloud-credentials: attempting to create resource
Secret/cloud-credentials: attempting to create resource client
Secret/cloud-credentials: created
BackupStorageLocation/default: attempting to create resource
BackupStorageLocation/default: attempting to create resource client
BackupStorageLocation/default: created
VolumeSnapshotLocation/default: attempting to create resource
VolumeSnapshotLocation/default: attempting to create resource client
VolumeSnapshotLocation/default: created
Deployment/velero: attempting to create resource
Deployment/velero: attempting to create resource client
Deployment/velero: created
DaemonSet/restic: attempting to create resource
DaemonSet/restic: attempting to create resource client
DaemonSet/restic: created
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.


After a short while you can check to see that the following pods are present in the velero namespace:

kubectl -n velero get po

NAME                               READY   STATUS    RESTARTS   AGE
backup-driver-7ff4c9849c-hffxt     1/1     Running   0          111s
datamgr-for-vsphere-plugin-sfftd   1/1     Running   0          91s
datamgr-for-vsphere-plugin-vnhcg   1/1     Running   0          91s
restic-npg8j                       1/1     Running   0          119s
restic-wbfgc                       1/1     Running   0          119s
velero-78777c7fbd-r85fx            1/1     Running   0          119s

The datamanager-for-vsphere pods are part of a daemonset that are deployed as part of the Velero Plugin for vSphere. If you had not included this plugin, these pods would not be present.

You can further validate that the Velero Plugin for vSphere is present via the velero plugin get command (look at the last line of output).

velero plugin get

 NAME                                 KIND
 velero.io/crd-remap-version          BackupItemAction
 velero.io/pod                        BackupItemAction
 velero.io/pv                         BackupItemAction
 velero.io/service-account            BackupItemAction
 velero.io/vsphere-pvc-backupper      BackupItemAction
 velero.io/vsphere-pvc-deleter        DeleteItemAction
 velero.io/add-pv-from-pvc            RestoreItemAction
 velero.io/add-pvc-from-pod           RestoreItemAction
 velero.io/change-pvc-node-selector   RestoreItemAction
 velero.io/change-storage-class       RestoreItemAction
 velero.io/cluster-role-bindings      RestoreItemAction
 velero.io/crd-preserve-fields        RestoreItemAction
 velero.io/init-restore-hook          RestoreItemAction
 velero.io/job                        RestoreItemAction
 velero.io/pod                        RestoreItemAction
 velero.io/restic                     RestoreItemAction
 velero.io/role-bindings              RestoreItemAction
 velero.io/service                    RestoreItemAction
 velero.io/service-account            RestoreItemAction
 velero.io/vsphere-pvc-restorer       RestoreItemAction
 velero.io/vsphere                    VolumeSnapshotter

To test it out, you can use some of the simple backup and restore examples available at https://velero.io/docs/v1.6/examples/. I’m using the one with persistent volumes since I specifically want to test this functionality out.

kubectl apply -f examples/nginx-app/with-pv.yaml

 namespace/nginx-example created
 persistentvolumeclaim/nginx-logs created
 deployment.apps/nginx-deployment created
 service/my-nginx created

You should see some activity in the vSphere Client as the PVs are created.

And you can find the specific PVC related to the deployed nginx app by querying on the label applied to it (app:nginx).

And at the command line, you should see the same PVC:

kubectl -n nginx-example get pvc --show-labels

 NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE     LABELS
 nginx-logs   Bound    pvc-3a1c73bf-a445-4845-a867-a466818d46f2   50Mi       RWO            default        2m41s   app=nginx

The following are the primary objects we’re concerned with in the nginx-example namespace as we’re going to backup the entire namespace and delete it shorty. We’ll need to make sure that these come back after a restore.

kubectl -n nginx-example get po,svc,pvc

NAME                                   READY   STATUS    RESTARTS   AGE
pod/nginx-deployment-66689547d-c7wqm   2/2     Running   0          3m12s

NAME               TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)        AGE
service/my-nginx   LoadBalancer   100.71.178.41   192.168.220.4     80:32184/TCP   3m12s

NAME                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/nginx-logs   Bound    pvc-3a1c73bf-a445-4845-a867-a466818d46f2   50Mi       RWO            default        3m37s

There are many options for creating a backup with Velero but one of the simplest it to backup the entire namespace. I’m also partial to backing up based on tag selection but you can read more about the various ways of pulling this off at https://velero.io/docs/v1.6/.

velero backup create nginx-backup --include-namespaces nginx-example

 Backup request "nginx-backup" submitted successfully.
 Run 'velero backup describe nginx-backup' or 'velero backup logs nginx-backup' for more details.
 ubuntu@cli-vm:~/velero$ velero backup describe nginx-backup

This output shows the backup having completed but if you check part way through you’ll see that the Phase is still InProgress. If you need more detail you can examine the logs for the backup job.

velero backup describe nginx-backup

Name:         nginx-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.20.4+vmware.1
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=20

Phase:  Completed

Errors:    0
Warnings:  0

Namespaces:
  Included:  nginx-example
  Excluded:  

Resources:
  Included:        *
  Excluded:        
  Cluster-scoped:  auto

Label selector:  

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  

Backup Format Version:  1.1.0

Started:    2021-04-21 06:52:46 -0600 MDT
Completed:  2021-04-21 06:53:11 -0600 MDT

Expiration:  2021-05-21 06:52:46 -0600 MDT

Total items to be backed up:  27
Items backed up:              27

Velero-Native Snapshots: 

Back in the vSphere Client we can see that the Velero Plugin for vSphere successfully created and deleted a snapshot of the virtual disk that backs the PVC being backed up.

And in the MinIO browser, there is now some data in the velero bucket.

Drilling down into the backups folder we can see that there are several file related to the different components that were backed up.

And in a subfolder of the plugins folder, we can see the backup of the PVC itself.

Using the velero backup get command we can also see that the backup completed successfully.

velero backup get
NAME           STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
nginx-backup   Completed   0        0          2021-04-21 06:52:46 -0600 MDT   29d       default            

One important thing to keep track of when backing up PVCs is the upload object. This is a CRD created when velero was installed and keeps track of the upload of the backup data to the S3 bucket. Even when the backup is completed, the upload may still be happening behind the scenes…especially if there are numerous or large PVCs being backed up. Any attempt to restore this backup before the upload is finished will fail.

kubectl -n velero get uploads -o yaml

apiVersion: v1
items:
- apiVersion: datamover.cnsdp.vmware.com/v1alpha1
  kind: Upload
  metadata:
    creationTimestamp: "2021-04-21T12:53:10Z"
    generation: 3
    labels:
      velero.io/exclude-from-backup: "true"
    name: upload-7d2568d4-0c04-43cc-bd2a-75f7491c47bd
    namespace: velero
    resourceVersion: "302328"
    uid: 542315b6-a926-4a7b-960f-f68fdbca8b1f
  spec:
    backupRepository: br-64316c5f-afa3-4e3b-bcb2-4bbb3e4c5ccd
    backupTimestamp: "2021-04-21T12:53:10Z"
    snapshotID: ivd:d1550d03-d0c6-485a-a63f-97deb652c8c1:7d2568d4-0c04-43cc-bd2a-75f7491c47bd
    snapshotReference: nginx-example/snap-781f369e-e4bc-4483-8792-8753278ff354
  status:
    completionTimestamp: "2021-04-21T12:53:48Z"
    message: Upload completed
    nextRetryTimestamp: "2021-04-21T12:53:10Z"
    phase: Completed
    processingNode: tkg-wld-md-0-84c4d7898c-vv9g2
    progress: {}
    startTimestamp: "2021-04-21T12:53:10Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Now that we’ve confirmed that the backup and upload are finished, we can delete our workload in preparation for testing the restore of it.

kubectl delete ns nginx-example

namespace "nginx-example" deleted

We’ve got a number of ways of confirming that the nginx deployment is gone. In the vSphere UI, there are several tasks related to the PVC being deleted.

And looking for the same PVC by label (app:nginx) shows that there is not one present any longer.

At the command line, we can list out all of the PVCs and see that the nginx one is gone.

kubectl get pvc -A

NAMESPACE                 NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
tanzu-system-monitoring   grafana-pvc                       Bound    pvc-6b12ffa0-4787-486d-93c1-32bca1bee5fd   2Gi        RWO            default        18d
tanzu-system-monitoring   prometheus-alertmanager           Bound    pvc-9368216e-6b22-4c19-b0b4-75b37f352248   2Gi        RWO            default        18d
tanzu-system-monitoring   prometheus-server                 Bound    pvc-eac3c942-170b-4ce9-ab9b-36db528dcf71   8Gi        RWO            default        18d
tanzu-system-registry     data-harbor-redis-0               Bound    pvc-d9e3098c-2a9c-41c6-9c73-f429167c6b5b   1Gi        RWO            default        18d
tanzu-system-registry     data-harbor-trivy-0               Bound    pvc-005bb5b5-72d4-4b65-9255-adbc08774228   5Gi        RWO            default        18d
tanzu-system-registry     database-data-harbor-database-0   Bound    pvc-7e74f1be-744d-46b2-a4bc-40bc457cd15c   1Gi        RWO            default        18d
tanzu-system-registry     harbor-jobservice                 Bound    pvc-086f068b-e4fc-4003-b91d-fe9ff42e00fe   1Gi        RWO            default        18d
tanzu-system-registry     harbor-registry                   Bound    pvc-2476a2ae-40c7-402a-9874-7a7ddd16ec3b   20Gi       RWO            default        18d

On to the restore. This is fairly straightforward and we really just need to know the name of the backup to restore.

velero restore create --from-backup nginx-backup

Restore request "nginx-backup-20210421072700" submitted successfully.
Run `velero restore describe nginx-backup-20210421072700` or `velero restore logs nginx-backup-20210421072700` for more details.

As with the backup, we can describe the restore to see how it’s going.

velero restore describe nginx-backup-20210421072700

Name:         nginx-backup-20210421072700
Namespace:    velero
Labels:       
Annotations:  

Phase:  Completed

Started:    2021-04-21 07:27:00 -0600 MDT
Completed:  2021-04-21 07:27:22 -0600 MDT

Backup:  nginx-backup

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  

Label selector:  

Restore PVs:  auto

Preserve Service NodePorts:  auto

In the vSphere Client, there is a lot of activity as the nginx deployment is recreated and the PVCs are created and reattached to the nodes.

Searching for the nginx PVC shows that is is present in inventory again.

At the command line, the pod, service and PVC are all present. Note that the PVC has a new name and the service is listening on a new backend port. This doesn’t affect anything functional but it’s worth calling out that this is due to these objects being recreated.

kubectl -n nginx-example get po,svc,pvc

NAME                                   READY   STATUS    RESTARTS   AGE
pod/nginx-deployment-66689547d-c7wqm   2/2     Running   0          2m13s

NAME               TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)        AGE
service/my-nginx   LoadBalancer   100.71.178.41   192.168.220.4     80:30820/TCP   2m13s

NAME                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/nginx-logs   Bound    pvc-c868f09e-f315-4448-aa87-876aa8e4b037   50Mi       RWO            default        2m34s

We can get a high-level check of the restore as well with the velero restore get command.

velero restore get

NAME                          BACKUP         STATUS      STARTED                         COMPLETED                       ERRORS   WARNINGS   CREATED                         SELECTOR
nginx-backup-20210421072700   nginx-backup   Completed   2021-04-21 07:27:00 -0600 MDT   2021-04-21 07:27:22 -0600 MDT   0        0          2021-04-21 07:27:00 -0600 MDT   

While not as important as the upload resource, there is also a download resource that you can examine to make sure that all requested data has been pulled from the S3 bucket.

kubectl -n velero get downloads -o yaml

apiVersion: v1
items:
- apiVersion: datamover.cnsdp.vmware.com/v1alpha1
  kind: Download
  metadata:
    creationTimestamp: "2021-04-21T13:27:06Z"
    generation: 3
    labels:
      velero.io/exclude-from-backup: "true"
    name: download-7d2568d4-0c04-43cc-bd2a-75f7491c47bd-8ec5fbd1-56db-4684-87a1-183ff4cbf811
    namespace: velero
    resourceVersion: "312672"
    uid: 5967c8e8-37f2-41d3-9832-b459047d672d
  spec:
    backupRepositoryName: br-64316c5f-afa3-4e3b-bcb2-4bbb3e4c5ccd
    clonefromSnapshotReference: nginx-example/347878fb-2348-4a20-b6b1-706e76e8b9bc
    protectedEntityID: ivd:f212a0c8-5b86-4ef5-8f69-4568feb2bd70
    restoreTimestamp: "2021-04-21T13:27:06Z"
    snapshotID: ivd:d1550d03-d0c6-485a-a63f-97deb652c8c1:7d2568d4-0c04-43cc-bd2a-75f7491c47bd
  status:
    completionTimestamp: "2021-04-21T13:27:21Z"
    message: Download completed
    nextRetryTimestamp: "2021-04-21T13:27:06Z"
    phase: Completed
    processingNode: tkg-wld-md-0-84c4d7898c-vv9g2
    progress: {}
    startTimestamp: "2021-04-21T13:27:06Z"
    volumeID: ivd:f212a0c8-5b86-4ef5-8f69-4568feb2bd70
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Lastly, if you are so inclined, you can delete the backup.

velero backup delete nginx-backup

Are you sure you want to continue (Y/N)? y
Request to delete backup "nginx-backup" submitted successfully.
The backup will be fully deleted after all associated data (disk snapshots, backup files, restores) are removed.

Checking the S3 bucket should show that everything related to the nginx backup is gone.

And the velero backup get command will return no results (assuming you have no other backups present).

Leave a Comment

Your email address will not be published. Required fields are marked *