Enabling and Using the Velero Service on a vSphere 8 with Tanzu Supervisor Cluster

I’ve written about using Velero for backing up Kubernetes workloads in the past but never in regards to vSphere with Tanzu and never having used the Velero Operator (or service as it’s called in vSphere with Tanzu). Velero can be enabled as a service at the Supervisor level and provide backup and restore functionality to workloads running in the supervisor cluster and in any Tanzu Kubernetes clusters.

Install the Velero Data Manager

The first thing you will need to do is download the Velero Data Manager OVA file. You can get the latest version from https://vmwaresaas.jfrog.io/ui/repos/tree/General/Velero-YAML/Velero/DataManager/1.2.0/datamgr-ob-20797900-photon-3-release-1.2.ova.

Note: You must have a DHCP server running to provide an IP address and DNS server information as a static IP address cannot be configured on the Data Manager VM.

Deploying the Data Manager OVA is very simple and you only need to supply the following parameters (these are specific to my environment):

  • VM name: velero-datamanager-01a
  • Storage: vol1
  • Network: Management

When the deployment is finished, don’t power the VM on immediately as you will need to make some changes to the advanced settings.

Right-click the velero-datamanager-01a VM, click Edit Settings.

Set CD/DVD drive 1 to Client DeviceNote: This got me early on as I didn’t realize the parameters that are set next don’t save with at the default, Host Device.

Click the Advanced Parameters tab. 

Set the following parameters (again, these are specific to my environment):

  • guestinfo.cnsdp.vcUser: administrator@vsphere.local
  • guestinfo.cnsdp.vcAddress: vcsa-01a.corp.vmw
  • guestinfo.cnsdp.vcPasswd: VMware1!
  • guestinfo.cnsdp.wcpControlPlaneIP: 192.168.110.101

Note: It’s a little difficult to tell once the rows are highlighted but the squares to the right of each item are basically accept and reject change toggles. You’ll want to make sure to click in each of the left-most squares (I added a red border to these in the screenshot for clarity).

Click the OK button to finish editing the VM advanced parameters. DO NOT power the VM on yet.

Install/Enable the Velero Service/Operator

For a little while now, Velero has provided an operator that makes deploying and configuring Velero easier. This operator can be installed to the supervisor cluster via a service.

From https://github.com/vsphere-tmm/Supervisor-Services/blob/main/README.md, download the Velero vSphere Operator v1.3.0

In the vSphere Client, navigate to Workload ManagementServices and click the Add button in the Add New Service pane.

Click the Upload button on the Register Service window.

Select the velero-supervisorservice-1.3.0.yaml file that was previously downloaded and click the Open button.

Click the Next button.

Scroll to the bottom of the EULA page and select the I agree to the terms and services checkbox. Click the Finish button.

You should see a Register Supervisor Service task.

On the Services page, you should now see a Velero vSphere Operator service. Click the Actions drop down on it and then select Install on Supervisors.

Select the appropriate Supervisor and click the OK button.

You will see a new namespace created named similar to svc-velero-vsphere-domain-c1006.

Some pods have been created in the new namespace but you won’t see them in the Inventory view under the new namespace. This is because they are only running on the control plane node and this view only shows vSphere pods (VMs) that are running on worker nodes. You can navigate to Workload Management, svc-velero-vsphere-domain-c1006, Compute, vSphere Pods and you will see them there:

These should have been started by now though. This is where I ran into my first problem.

Using kubectl, you can see that some pods have been created in this namespace…but they aren’t starting up.

kubectl -n svc-velero-vsphere-domain-c1006 get po
NAME                                               READY   STATUS    RESTARTS   AGE
velero-vsphere-operator-6455dc6c88-qw2cq           0/1     Pending   0          66s
velero-vsphere-operator-webhook-67dc47f657-8462h   0/1     Pending   0          66s
velero-vsphere-operator-webhook-67dc47f657-hqbsq   0/1     Pending   0          66s
velero-vsphere-operator-webhook-67dc47f657-sbkzg   0/1     Pending   0          66s

Taking a closer look at the velero-vsphere-operator-6455dc6c88-qw2cq pod, you can see that there is a problem

kubectl -n svc-velero-vsphere-domain-c1006 describe po velero-vsphere-operator-6455dc6c88-qw2cq
...
Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  2m4s (x2 over 7m33s)  default-scheduler  0/7 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling.

This looks like a taint/toleration issue between the pod (all pods in this namespace actually) and the control plane nodes. You can examine the deployment responsible for this pod to see what the tolerations configured are.

kubectl -n svc-velero-vsphere-domain-c1006 get deployments velero-vsphere-operator -o jsonpath="{.spec.template.spec.tolerations}" |jq
[
  {
    "effect": "NoSchedule",
    "key": "node-role.kubernetes.io/master",
    "operator": "Exists"
  },
  {
    "effect": "NoSchedule",
    "key": "kubeadmNode",
    "operator": "Equal",
    "value": "master"
  }
]

This output needs to be compared with the taints applied to the control plane nodes.

kubectl get nodes -o json | jq '.items[].spec.taints'
[
  {
    "effect": "NoSchedule",
    "key": "node-role.kubernetes.io/control-plane"
  }
]
[
  {
    "effect": "NoSchedule",
    "key": "node-role.kubernetes.io/control-plane"
  }
]
[
  {
    "effect": "NoSchedule",
    "key": "node-role.kubernetes.io/control-plane"
  }
]

You can see that the tolerations configured for the pods in the deployment and the taint configured on the control plane nodes do not match up. The control plane nodes in vSphere 8.0 are now using the “control-plane” term instead of the older “master” term. With this in mind, you will need to update the velero-vsphere-operator and the velero-vsphere-operator-webhook deployments.

kubectl -n svc-velero-vsphere-domain-c1006 edit deployments velero-vsphere-operator

The following sections must be edited, replacing “master” with “control-plane”.

nodeSelector:
  node-role.kubernetes.io/master: ""
tolerations:
- effect: NoSchedule
  key: node-role.kubernetes.io/master
  operator: Exists
- effect: NoSchedule
  key: kubeadmNode
  operator: Equal
  value: master

After editing, the noted sections should look like the following:

nodeSelector:
  node-role.kubernetes.io/control-plane: ""
tolerations:
- effect: NoSchedule
  key: node-role.kubernetes.io/control-plane
  operator: Exists
- effect: NoSchedule
  key: kubeadmNode
  operator: Equal
  value: control-plane

Immediately after making this change, you should see a second velero-vsphere-operator pod being created.

kubectl -n svc-velero-vsphere-domain-c1006 get po
NAME                                               READY   STATUS              RESTARTS   AGE
velero-vsphere-operator-6455dc6c88-qw2cq           0/1     Pending             0          30m
velero-vsphere-operator-7f5bf5d8f6-8lkb9           0/1     ContainerCreating   0          10s
velero-vsphere-operator-webhook-67dc47f657-8462h   0/1     Pending             0          30m
velero-vsphere-operator-webhook-67dc47f657-hqbsq   0/1     Pending             0          30m
velero-vsphere-operator-webhook-67dc47f657-sbkzg   0/1     Pending             0          30m

And very shortly afterwards, the new pod should be up and running and the original deleted.

kubectl -n svc-velero-vsphere-domain-c1006 get po
NAME                                               READY   STATUS    RESTARTS   AGE
velero-vsphere-operator-7f5bf5d8f6-8lkb9           1/1     Running   0          60s
velero-vsphere-operator-webhook-67dc47f657-8462h   0/1     Pending   0          31m
velero-vsphere-operator-webhook-67dc47f657-hqbsq   0/1     Pending   0          31m
velero-vsphere-operator-webhook-67dc47f657-sbkzg   0/1     Pending   0          31m

Repeat the previous remediation steps for the velero-vsphere-operator-webhook deployment.

Create the velero Namespace

ou’ll next need to create a new namespace to be used for Velero Kubernetes objects.

  • In the vSphere Client, navigate to Workload ManagementNamespaces and click the New Namespace link.
  • Select the svc1 Supervisor and set namespace name to velero.
  • Click the Create button

As was done for the test namespace, you will need to set permissions and assign storage.

  • While on the velero namespace (you should be here just after creating the velero namespace), click the Permissions tab and then click the Add button.
    • Identity Source: vsphere.local
    • User/Group Search: administrator
    • Role: Owner
    • Click the OK button.
  • Click the Add button again.
    • Identity Source: corp.vmw
    • User/Group Search: vmwadmins (group)
    • Role: Owner
    • Click the OK button.
  • Click the Storage tab and then click the Edit button.
    • Select K8s-policy
    • Click the OK button.

Install the Velero Plugin

From the cli-vm, ensure that you are logged in to the supervisor cluster.

kubectl vsphere login --server wcp.corp.vmw -u vmwadmin@corp.vmw
 
 
Logged in successfully.
 
You have access to the following contexts:
   test
   velero
   wcp.corp.vmw
 
If the context you wish to use is not in this list, you may need to try
logging in again later, or contact your cluster administrator.
 
To change context, use `kubectl config use-context <workload name>`

Note that you now have a velero context as well (we’re not using that one though).

Set your context to the supervisor cluster itself.

kubectl config use-context wcp.corp.vmw
Switched to context "wcp.corp.vmw".

You need to create a ConfigMap definition that will tell the Velero plugin that we’re working with a Supervisor cluster as well as supply a name for the vSphere configuration secret (velero-vsphere-config-secret) and tell it what namespace we’re working in (velero).

apiVersion: v1
kind: ConfigMap
metadata:
  name: velero-vsphere-plugin-config
data:
  cluster_flavor: SUPERVISOR

With this file created, we can apply it against the velero namespace.

kubectl apply -n velero -f supervisor-velero-vsphere-plugin-config.yaml

You will need to download the velero-vsphere CLI, which can be obtained from https://github.com/vmware-tanzu/velero-plugin-for-vsphere/releases/download/v1.4.2/velero-vsphere-1.3.0-linux-amd64.tar.gz (Linux version). Extract the velero-vsphere binary and copy it to a system with access to the supervisor cluster (and make it executable).

Velero requires an S3 bucket for storing backup data. I have Minio configured via a TrueNAS appliance. The S3 service here is configured to use a certificate signed by my internal CA but the velero-vsphere CLI has no provision for supplying certificate information during configuration. With this in mind, you need to configure your S3 location to either use a certificate signed by a public CA or none at all.

Within my MinIO deployment, I have created a bucket named velero for use with Velero backups.

To supply the credentials for the S3 bucket to the velero-vsphere command, you need to create a credentials file.

Create a file named s3-credentials with the content similar to the following:

[default]
aws_access_key_id = miniovmw
aws_secret_access_key = minio123

You are finally ready to use the velero-vsphere command to deploy velero.

velero-vsphere install  --namespace velero --version v1.9.2 --image velero/velero:v1.9.2 --provider aws --plugins velero/velero-plugin-for-aws:v1.6.1,vsphereveleroplugin/velero-plugin-for-vsphere:v1.4.2 --bucket velero --secret-file /home/ubuntu/Velero/s3-credentials --snapshot-location-config region=minio --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.110.60:9000

Send the request to the operator about installing Velero in namespace velero

Note: You might have some wiggle room with the versions of the various noted components in this command but be sure to check the Velero Plugin for vSphere in vSphere With Tanzu Supervisor Cluster Compatibility Matrix.

The Velero operator that was previously configured in the supervisor cluster handles the creation of the necessary Velero objects in the velero namespace.

There are two deployments that get created in the velero namespace: velero and backup-driver. I ran into my second issue when I saw that the pods for the backup-driver deployment were stuck in a pending state. Taking a closer look revealed what was likely the same issue that I saw earlier with the the Velero service pods.

kubectl -n velero describe po backup-driver-567b85b996-prq8j
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  114s  default-scheduler  0/7 nodes are available: 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling.

And checking the tolerations on the deployment showed that it was the same problem of not working with the “control-plane” taint on the control plane nodes.

kubectl -n velero get deployments backup-driver -o jsonpath="{.spec.template.spec.tolerations}" |jq
[
  {
    "effect": "NoSchedule",
    "key": "node-role.kubernetes.io/master",
    "operator": "Exists"
  },
  {
    "effect": "NoSchedule",
    "key": "kubeadmNode",
    "operator": "Equal",
    "value": "master"
  }
]

You will need to edit the backup-driver deployment to correct this discrepancy. 

kubectl -n velero edit deployments backup-driver

The following sections must be edited, replacing “master” with “control-plane”.

      nodeSelector:
        node-role.kubernetes.io/master: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
        operator: Exists
      - effect: NoSchedule
        key: kubeadmNode
        operator: Equal
        value: master

After editing, the noted sections should look like the following:

      nodeSelector:
        node-role.kubernetes.io/control-plane: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists
      - effect: NoSchedule
        key: kubeadmNode
        operator: Equal
        value: control-plane

Immediately after making this change, you should see a second backup-driver pod being created.

kubectl -n velero get po
NAME                             READY   STATUS              RESTARTS   AGE
backup-driver-567b85b996-prq8j   0/1     Pending             0          6m13s
backup-driver-d487c5fbb-6pjr4    0/1     ContainerCreating   0          13s
velero-fcf47c755-mhphp           1/1     Running             0          7m55s

And very shortly afterwards, the new pod should be up and running and the original deleted.

kubectl -n velero get po
NAME                            READY   STATUS    RESTARTS   AGE
backup-driver-d487c5fbb-6pjr4   1/1     Running   0          14s
velero-fcf47c755-mhphp          1/1     Running   0          7m56s

NOTE: If you’re following along here while configuring this for yourself, you will want to skip to the Create a velero-token Secret section. I encountered a bug in how the Velero operator is instantiating the velero service account. You would be best served to read the entire blog post and then come back to this section with an understanding of why you don’t want to proceed with this step right away.

At this point, it is okay to power the velero-dtamanager-01a VM on.

Since you cannot configure the Data Manager VM with a static IP address, check the Virtual Machine Details page for the VM to see what IP address it has received via DHCP.

Backup a Stateful Workload in the Supervisor Cluster

NOTE: As noted a the end of the previous section, if you’re following along, be sure to go through the steps in the later section, Create a velero-token Secret, before trying a backup. You’ll see why shortly.

You’ll need to download the velero CLI to be able to take backups of Kubernetes workloads. You can get the velero CLI tgz file from https://github.com/vmware-tanzu/velero/releases/download/v1.9.2/velero-v1.9.2-linux-amd64.tar.gz (this is not the most current version but needs to match what has been installed by velero-vsphere).

Extract the velero-v1.9.2-linux-amd64/velero binary from the .tgz file and make it executable.

Validate that the CLI is working and can talk to the deployed Velero components:

velero version
Client:
        Version: v1.9.2
        Git commit: 82a100981cc66d119cf9b1d121f45c5c9dcf99e1
Server:
        Version: v1.9.2

You can use the velero backup create command to backup a workload. In my previous post, Installing Harbor Image Registry to a vSphere 8 with Tanzu Supervisor Cluster, I had created a very simple nginx deployment in the test namespace of the supervisor cluster. This deployment had a persistent volume (backed by vSphere Cloud Native Storage) and a load-balanced IP address (backed by NSX). I’ll be using this deployment to test out a backup/restore scenario with Velero.

velero backup create nginx --include-resources services,deployments,persistentvolumeclaims --include-namespaces test
Backup request "nginx" submitted successfully.
Run `velero backup describe nginx` or `velero backup logs nginx` for more details.

I’m filtering on resources since there are several CRDs (created as part of a supervisor namespace) that cannot be backed up with Velero.

You can see that Velero initiated a snapshot of the persistent volume associated with the nginx deployment.

You can check for the existence of snapshots (CRD created by Velero) via kubectl as well.

kubectl get snapshot -n test
NAME                                        AGE
snap-9ea06814-4df4-48ff-9aad-ed0997e6290c   26s

You can describe the backup to see more details about what was done.

velero backup describe nginx
Name:         nginx
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.24.9+vmware.wcp.1
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=24

Phase:  Completed

Errors:    0
Warnings:  0

Namespaces:
  Included:  test
  Excluded:  <none>

Resources:
  Included:        services, deployments, persistentvolumeclaims
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2023-03-21 14:10:27 -0700 PDT
Completed:  2023-03-21 14:11:03 -0700 PDT

Expiration:  2023-04-20 14:10:27 -0700 PDT

Total items to be backed up:  3
Items backed up:              3

Velero-Native Snapshots: <none included>

You can look at the contents of the velero bucket in MInIO and see that there is now a backups folder there.

Drilling farther down you can see some details of what Velero saved to the S3 bucket.

There is a CRD created by Velero called upload, that describes the process of transferring the volume snapshot to the S3 bucket.

kubectl -n velero get upload
NAME                                          AGE
upload-4f2738bf-450d-425c-a148-2196615fae30   2m25s
kubectl -n velero get upload upload-4f2738bf-450d-425c-a148-2196615fae30 -o jsonpath="{.status}" | jq
{
  "nextRetryTimestamp": "2023-03-21T21:11:03Z",
  "phase": "New",
  "progress": {}
}

This is where I noticed a problem. The upload never moved past the “new” phase. The issue is with the data manager and that the script that instantiates the velero data manager container cannot find a secret named “velero-token” in the velero namespace.

Create a velero-token Secret

It turns out that service account token secrets are not automatically created in Kubernetes 1.24 and up, hence the issue of our missing velero-token secret. You can manually create this secret though to work around the issue until a newer version of the Velero service for vSphere with Tanzu is released that addresses this issue.

You will need to create a definition file for the secret that basically just tells it to create a token for the velero service account.

apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: velero-token
  annotations:
    kubernetes.io/service-account.name: "velero"
kubectl -n velero apply -f velero-token.yaml

You can check that this secret is created properly.

kubectl -n velero describe secret velero-token
Name:         velero-token
Namespace:    velero
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: velero
              kubernetes.io/service-account.uid: a868e1c1-f13b-40af-aecc-ca16e493388b

Type:  kubernetes.io/service-account-token

Data
====
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6Ii05RDc2OFdwLXM2QVlfM2hIdnQ5b2NoYVlSZE4tZ2RIVnlSV0pVS0FDd2sifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJ2ZWxlcm8iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoidmVsZXJvLXRva2VuIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6InZlbGVybyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImE4NjhlMWMxLWYxM2ItNDBhZi1hZWNjLWNhMTZlNDkzMzg4YiIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDp2ZWxlcm86dmVsZXJvIn0.IrHDLxNM_DIyx1By0nzRPBBJqv6HHgxdCpJqFKH3e9kv3pO9CUf2hlvpCXpRVlo8u33i24Z209N0P0nb1tiNgquxBbsJkJ3d4r31_6w38HHtLYEPjJc9Ct1DyR6i2gRWwT-RXfGPzffhIxTnrwdyCNhPhQQeZUp5ufwjJFuoa69M_IYKWm4LB6_HjN8TjkzHXldHsjow8ztYDV9I_izgxAgt-SLpiuo79Pk3PLNjXtp8P-DRyfIsoJ7yC5ZhPmjWwJpbWoHE5YnoCjZjJv0f81na-V1HMYeSLgDN0CscxPe0EepW_WyDd2vkepEDTGwSJWJ4IqMzPvxMWwik0aHnRA
ca.crt:     1099 bytes
namespace:  6 bytes

If you have already started the Data Manager VM, you will likely need to log in and restart the velero-datamgr service. You can login at the VM console or SSH in as the root user with a default password of changeme (you will be forced to change this password).

systemctl restart velero-datamgr

You can validate that the service is/was running properly (it basically runs quickly and just starts the velero-datamgr container

systemctl status velero-datamgr
● velero-datamgr.service - Start Velero vsphere plugin data manager
   Loaded: loaded (/lib/systemd/system/velero-datamgr.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Mon 2023-04-10 15:46:32 UTC; 20min ago
     Docs: https://github.com/vmware-tanzu/velero-plugin-for-vsphere
  Process: 495 ExecStart=/usr/bin/velero-vsphere-plugin-datamgr.sh (code=exited, status=0/SUCCESS)
 Main PID: 495 (code=exited, status=0/SUCCESS)

Apr 10 15:46:25 photon-cnsdp velero-vsphere-plugin-datamgr.sh[495]: 41bd06e354e8: Verifying Checksum
Apr 10 15:46:25 photon-cnsdp velero-vsphere-plugin-datamgr.sh[495]: 41bd06e354e8: Download complete
Apr 10 15:46:26 photon-cnsdp velero-vsphere-plugin-datamgr.sh[495]: 916eae7333e9: Pull complete
Apr 10 15:46:30 photon-cnsdp velero-vsphere-plugin-datamgr.sh[495]: 024b20d3e68e: Pull complete
Apr 10 15:46:30 photon-cnsdp velero-vsphere-plugin-datamgr.sh[495]: 726c665132fb: Pull complete
Apr 10 15:46:30 photon-cnsdp velero-vsphere-plugin-datamgr.sh[495]: 41bd06e354e8: Pull complete
Apr 10 15:46:32 photon-cnsdp velero-vsphere-plugin-datamgr.sh[495]: Digest: sha256:da87ca573af13e90410b8dac933b052a95dab779d471e915eb8829107006a24c
Apr 10 15:46:32 photon-cnsdp velero-vsphere-plugin-datamgr.sh[495]: Status: Downloaded newer image for vsphereveleroplugin/data-manager-for-plugin:v1.4.1
Apr 10 15:46:32 photon-cnsdp velero-vsphere-plugin-datamgr.sh[495]: Total reclaimed space: 0B
Apr 10 15:46:32 photon-cnsdp velero-vsphere-plugin-datamgr.sh[495]: fde6ec1adb2ac4a27b1c4394aa43f2a8abcd04f68792dffbcddbaa72d037304f
docker ps
CONTAINER ID        IMAGE                                                COMMAND                  CREATED             STATUS              PORTS               NAMES
fde6ec1adb2a        vsphereveleroplugin/data-manager-for-plugin:v1.4.1   "/datamgr server --u…"   21 minutes ago      Up 21 minutes                           velero-datamgr

If you had not yet started the data manager VM, now would be the time to do so. You could then move on to the Backup a Stateful Workload in the Supervisor Cluster section and take a backup without any errors.

Backup a Stateful Workload in the Supervisor Cluster – Round Two

The previous backup can be removed with the velero backup delete command and you can use the exact same command as we used earlier to create a new one. You may find that there is still a snapshot on the persistent volume and than can be deleted by following the first three steps in Cormac Hogan’s blgo post, Task “Delete a virtual storage object” reports “A specified parameter was not correct”. With a the new backup created, you can inspect it to make sure that the same problem does not exist.

velero backup describe nginx
Backup request "nginx" submitted successfully.
Run `velero backup describe nginx` or `velero backup logs nginx` for more details.
ubuntu@cli-vm:~/Velero$ velero backup describe nginx
Name:         nginx
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.24.9+vmware.wcp.1
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=24

Phase:  Completed

Errors:    0
Warnings:  0

Namespaces:
  Included:  test
  Excluded:  <none>

Resources:
  Included:        services, deployments, persistentvolumeclaims
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2023-04-10 09:12:47 -0700 PDT
Completed:  2023-04-10 09:13:23 -0700 PDT

Expiration:  2023-05-10 09:12:47 -0700 PDT

Total items to be backed up:  3
Items backed up:              3

Velero-Native Snapshots: <none included>
kubectl -n velero get upload
NAME                                          AGE
upload-689c6442-74a1-4b60-bb66-014c7b95c005   2m12s
kubectl -n velero get upload upload-689c6442-74a1-4b60-bb66-014c7b95c005 -o jsonpath="{.status}" | jq
{
  "completionTimestamp": "2023-04-10T16:13:41Z",
  "message": "Upload completed",
  "nextRetryTimestamp": "2023-04-10T16:13:28Z",
  "phase": "Completed",
  "processingNode": "192.168.100.100-00:50:56:b8:72:ca",
  "progress": {},
  "startTimestamp": "2023-04-10T16:13:28Z"
}

You can see from the output that the upload phase is Completed.

And you should see more snapshot operations on the virtual disk backing the nginx deployment in the test namespace than were seen earlier. The most important thing is that the snapshot that was taken is deleted.

You’ll also see an extra folder under the velero bucket in MinIO for the stateful backup data:

If you drill far enough down into this folder you will eventually get to the pv backup itself.

Notice that the size is 50MB. You can check this against the size of the ngxinx-logs pvc.

kubectl -n test describe pvc nginx-logs
Name:          nginx-logs
Namespace:     test
StorageClass:  k8s-policy
Status:        Bound
Volume:        pvc-8fd7cdf7-b1e3-4127-856f-ef60476ad5dc
Labels:        app=nginx
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
               volume.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
               volumehealth.storage.kubernetes.io/health: accessible
               volumehealth.storage.kubernetes.io/health-timestamp: Tue Mar 21 13:41:25 UTC 2023
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      50Mi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       nginx-deployment-555bf4ff65-h8zvm
Events:        <none>

Delete a Backed-Up Stateful Workload and Restore it with Velero

Delete the nginx deployment so you can test the restore.

kubectl delete -f with-pv.yaml
persistentvolumeclaim "nginx-logs" deleted
deployment.apps "nginx-deployment" deleted
service "my-nginx" deleted

You will see the backing virtual disk get deleted in vCenter.

You can check to see that no pods, persistent volume claims or services remain in the test namespace.

kubectl -n test get po,pvc,svc
No resources found in test namespace.

Before starting the restore, you can use the velero command to check that the backup is present.

velero backup get
NAME    STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
nginx   Completed   0        0          2023-04-10 09:12:47 -0700 PDT   29d       default            <none>

Use the velero restore command in conjunction with the name of the backup taken earlier (nginx) to initiate a restore.

velero restore create --from-backup nginx
Restore request "nginx-20230410101021" submitted successfully.
Run `velero restore describe nginx-20230410101021` or `velero restore logs nginx-20230410101021` for more details.

You can investigate the status of the restore operation.

velero restore describe nginx-20230410101021
Name:         nginx-20230410101021
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:                       Completed
Total items to be restored:  3
Items restored:              3

Started:    2023-04-10 10:10:21 -0700 PDT
Completed:  2023-04-10 10:10:32 -0700 PDT

Warnings:
  Velero:     <none>
  Cluster:  stat /tmp/861032209/resources/persistentvolumes/cluster/pvc-8fd7cdf7-b1e3-4127-856f-ef60476ad5dc.json: no such file or directory
  Namespaces: <none>

Backup:  nginx

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Existing Resource Policy:   <none>

Preserve Service NodePorts:  auto

If it’s working as expected, you should see a new virtual disk getting created and attached to the new nginx vSphere pod

Back at the command line, you can check for the presence of pods, persistent volume claims and services in the test namespace.

kubectl -n test get po,pvc,svc
NAME                                    READY   STATUS    RESTARTS   AGE
pod/nginx-deployment-555bf4ff65-4nkwm   2/2     Running   0          2m59s

NAME                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/nginx-logs   Bound    pvc-518ed7be-0cfb-41f1-9a72-f07435a5ec66   50Mi       RWO            k8s-policy     3m9s

NAME               TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
service/my-nginx   LoadBalancer   10.96.1.157   10.40.14.69   80:31580/TCP   2m59s

And you can validate that the my-nginx service (at 10.40.14.69) is serving up the default nginx web page.

I’ll revisit Velero in a future post to see how it works for backing up workloads in Tanzu Kubernetes clusters deployed via vSphere 8 with Tanzu.

1 thought on “Enabling and Using the Velero Service on a vSphere 8 with Tanzu Supervisor Cluster”

  1. I deployed v1.5.0 of the Velero operator and I was hitting an issue where pods were stuck in ImagePullBackOff state and the error was Server message: toomanyrequests: You have reached your pull rate limit.

    I had to create a docker secret and add it to the default service account in the svc-velero-xxx namespace and restart pods to get this working.

Leave a Comment

Your email address will not be published. Required fields are marked *