Deploying vSphere 7.0 U2 with Tanzu while using NSX Advanced Load Balancer

In one of my earlier posts, How to install vSphere with Tanzu with vSphere Networking, I did a walkthrough of installing vSphere with Tanzu 7.0 U1 and using HAProxy as the Load Balancer solution. I’m going at it again but in 7.0 U2 we now support using NSX Advanced Load Balancer (formerly Avi Vantage) for providing Load Balancer services.

In my previous post, Deploying NSX Advanced Load Balancer for use with Tanzu Kubernetes Grid (1.3 release) and vSphere with Tanzu (7.0 U2 release), I went through the steps needed to get NSX Advanced Load Balancer (NSX ALB) up and running such that it could be used for Tanzu Kubernetes Grid (TKG) or vSphere with Tanzu. In this post, I’ll get vSphere with Tanzu installed and make use of the Load Balancer functionality afforded by NSX ALB.

Configure Workload Management

You might notice that this jumps right into configuring Workload Management and skips the needed steps of creating a content library for workload clusters (TKG Service clusters) and of configuring a Storage Policy. I’m using the same instances of each noted in How to install vSphere with Tanzu with vSphere Networking, so no real need to document their configuration again.

Navigate to the Workload Management page and click on the Get Started button.

You’ll get a warning noting that you have to have either HAProxy or NSX ALB (noted as Avi here) present in order to use the vCenter Server Network option.

Select a compatible cluster.

Select an appropriate size for your deployment.

Select an appropriate Storage Policy.

Complete the information relevant to your NSX ALB deployment. Supplying the certificate was easy for me since I used my own wildcard certificate. If you don’t have the NSX ALB certificate handy, you can download it from the NSX ALB UI via the Templates, Security, SSL/TLS Certificates page.

The Starting IP Address value will be the first of five consecutive IP addresses that can be used for Supervisor Cluster control plane nodes. Everything else here should be fairly standard.

You can leave the default IP address for Services value in place or choose something else. Enter an appropriate IP address for your DNS Server. Click the Add button under Workload Network.

Choose an appropriate portgroup (K8s-Workload in this example) and enter the Layer 3 Routing Configuration information appropriate for your environment.

If everything looks good here, move on to the next page.

Click the Add button next to Add Content Library.

Select the appropriate Content Library for serving up Kubernetes node images.

Click the Finish button to get the deployment started.

As noted in How to install vSphere with Tanzu with vSphere Networking, you’ll see a Namespaces folder get created in the Hosts and Clusters view of the vSphere Client and the Supervisor Cluster Control Plane VMs will be placed there. When the deployment is getting close to being done, you should see three VMs present.

For a short time, the Control Plane Node IP Address will be the first IP address supplied on the Workload Management configuration page.

Once the deployment is complete, the Control Plane Node IP Address will change to a VIP supplied by NSX ALB. The Config Status should also change to Running.

Now, even though the Workload Management page shows that the deployment is finished, you might find that you are not able to get to the noted IP address right away. This is likely because NSX ALB has not provisioned the Service Engines (SEs) yet that will provide the infrastructure needed for hosting the needed Virtual Services. Once these are in place and configured properly, you’ll see them present in the NSX ALB UI on the Infrastructure, Service Engine page.

If everything is healthy, you should see a “green” service on the Applications, Virtual Services page. Note that the Address is the one supplied as the Control Plane Node IP Address on the Workload Management page in the vSphere Client.

Don’t worry if this is not green right away. In my lab, I found that it took several minutes as NSX ALB continuously updated the health of the service. It won’t go green until the overall health is 85 or higher…even if the service if perfectly functional.

On the Applications, Dashboard page you can use the VS Tree view to see a logical representation of the traffic flow for the service. Note that the endpoints are the three Supervisor Cluster Control Plane nodes, on ports 443 and 6443.

Back in the vSphere Client you should see the two SE VMs in the inventory now (AviTanzu*).

Now that the Supervisor Cluster is up and functional, you’ll want to navigate to Administration, Licensing, Licenses, Assets, Supervisor Clusters to apply a valid license.

Replace the Certificate

The first thing to do is replace the default certificate with one that will be trusted. From the Hosts and Clusters View, select the cluster where the Supervisor Cluster is deployed and then click on the Configure tab. Under Namespaces, select Certificates.

You can click View Details to see information about the current certificate.

Click on the Actions menu and then select Generate CSR.

Enter the appropriate details for the new certificate on the Enter Info page.

Copy or download the CSR that is generated and take it to a CA to get a certificate generated.

When you have the new certificate ready, click the Actions menu and then select Replace Certificate.

Paste or upload the new certificate and click the Replace button.

If it worked, you’ll see a page similar to the following:

And navigating to https://<Control Plane Node IP Address> will not produce a certificate warning.

From this page you’ll want to copy the CLI Plugin to a location where you expect to be running kubectl commands.

Configure a Supervisor Cluster Namespace

From the Workload Management page, navigate to Namespaces and click on the Create Namespace button.

Select the appropriate cluster and network (network-1 which we created during deployment) and give the Namespace a name.

We have a little bit of information about our new Namespace here but we need to add some Permissions and Storage before we can really make use of it.

Click on the Add Permissions button.

Complete the Add Permissions page as appropriate. Bear in mind that the user(s) you specify here will be able to login to the Supervisor cluster via the kubectl command, and the Role assigned here dictates whether they are an administrator or have read-only access.

I’m also granting access to an AD user from the corp.tanzu domain.

On the Namespace page, click on the Add Storage button. Select an appropriate datastore.

You could also set limits on the compute and storage resources available to this Namespace via the Capacity and Usage tile but it is not necessary for this example.

Create a Tanzu Kubernetes cluster

Now we’re ready to login and create a Tanzu Kubernetes cluster. From the system where you have copied the CLI Plugin for vSphere, issue a command similar to the following to login (note that the IP address is what was indicated on the Workload Management page and the user, administrator@vsphere.local, is what we specified when creating the Namespace):

kubectl vsphere login --server 192.168.220.2

Username: administrator@vsphere.local
KUBECTL_VSPHERE_PASSWORD environment variable is not set. Please enter the password below
Password:
Logged in successfully.

You have access to the following contexts:
   192.168.220.2
   tkg

If the context you wish to use is not in this list, you may need to try
logging in again later, or contact your cluster administrator.

To change context, use `kubectl config use-context <workload name>`

I didn’t know about the new ability to supply a password via an environment variable but I knew that it was a huge ask from many customers as it’s lack made automating things very difficult.

export KUBECTL_VSPHERE_PASSWORD=VMware1!

kubectl vsphere login --server 192.168.220.2 -u administrator@vsphere.local

Logged in successfully.

You have access to the following contexts:
   192.168.220.2
   tkg

If the context you wish to use is not in this list, you may need to try
logging in again later, or contact your cluster administrator.

To change context, use `kubectl config use-context <workload name>`

That worked great! On to viewing the nodes in the cluster.

kubectl get nodes -o wide

NAME                               STATUS   ROLES    AGE   VERSION         INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                 KERNEL-VERSION       CONTAINER-RUNTIME
421d0c329c86f8a10347027a6f8ce1c4   Ready    master   13h   v1.19.1+wcp.2   192.168.130.2   <none>        VMware Photon OS/Linux   4.19.164-3.ph3-esx   containerd://1.3.3
421d202e406c7a419f64b15a0268cd89   Ready    master   13h   v1.19.1+wcp.2   192.168.130.3   <none>        VMware Photon OS/Linux   4.19.164-3.ph3-esx   containerd://1.3.3
421dc7e4475bea2c5a1e3c561e0162cd   Ready    master   14h   v1.19.1+wcp.2   192.168.130.4   <none>        VMware Photon OS/Linux   4.19.164-3.ph3-esx   containerd://1.3.3

And you should see that your context is set to the Supervisor Cluster Namespace that was created earlier.

kubectl config get-contexts

CURRENT   NAME            CLUSTER         AUTHINFO                                        NAMESPACE
          192.168.220.2   192.168.220.2   wcp:192.168.220.2:administrator@vsphere.local
*         tkg             192.168.220.2   wcp:192.168.220.2:administrator@vsphere.local   tkg

Creating a Tanzu Kubernetes cluster is as simple as kubectl apply‘ing a small bit of yaml that will build out a Custom Resource Definition (CRD) called a TanzuKubernetesCluster. The following is a minimally functional definition file that I have used in my lab:

apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
  name: tkg-cluster #name of cluster
  namespace: tkg
spec:
  topology:
    controlPlane:
      count: 1
      class: best-effort-xsmall # vmclass to be used for master(s)
      storageClass: k8s-policy
    workers:
      count: 2
      class: best-effort-xsmall # vmclass to be used for workers(s)
      storageClass: k8s-policy
  distribution:
    version: v1.19

A few of the main things to take away from this:

  • The name of the cluster is tkg-cluster.
  • The Namespace to which it will be deployed is tkg.
  • There is only one control plane node and two worker nodes.
  • The “class” is best-effort-xsmall, which means they will not have any resources reserved (best-effort) and will have 2 vCPU, 2GB RAM and 16GB of disk. You can read more about Virtual Machine Class Types at Virtual Machine Class Types for Tanzu Kubernetes Clusters.
  • The storageClass is k8s-policy, the one we created earlier.
  • The Kubernetes version is 1.19 but this will be matched to the a full version that includes this short version notation.

You can read up on all of the available options at Configuration Parameters for Tanzu Kubernetes Clusters.

The only thing left to do is apply the yaml definition file.

kubectl apply -f tkg-cluster.yaml

tanzukubernetescluster.run.tanzu.vmware.com/tkg-cluster created

As with the Supervisor Cluster, you’ll see VMs getting provisioned.

And you’ll see a new structure getting created under the Namespaces folder (tkg, tkg-cluster)

I like to have one session open where I’m just watching the events from the Supervisor Cluster to make sure nothing concerning jumps out and that things are moving along as expected.

kubectl get events -w
LAST SEEN   TYPE      REASON                    OBJECT                                                    MESSAGE
2m34s       Warning   ReconcileFailure          wcpmachine/tkg-cluster-control-plane-cmscz-hqttb          vm is not yet created: vmware-system-capw-controller-manager/WCPMachine//tkg/tkg-cluster/tkg-cluster-control-plane-cmscz-hqttb
0s          Warning   ReconcileFailure          wcpmachine/tkg-cluster-control-plane-cmscz-hqttb          vm is not yet created: vmware-system-capw-controller-manager/WCPMachine//tkg/tkg-cluster/tkg-cluster-control-plane-cmscz-hqttb
4m31s       Normal    CreateK8sServiceSuccess   virtualmachineservice/tkg-cluster-control-plane-service   CreateK8sService success
4m32s       Normal    CertificateIssued         certificaterequest/tkg-cluster-extensions-ca-569859115    Certificate fetched from issuer successfully
4m32s       Normal    KeyPairVerified           issuer/tkg-cluster-extensions-ca-issuer                   Signing CA verified
51s         Normal    KeyPairVerified           issuer/tkg-cluster-extensions-ca-issuer                   Signing CA verified
4m32s       Normal    GeneratedKey              certificate/tkg-cluster-extensions-ca                     Generated a new private key
4m32s       Normal    Requested                 certificate/tkg-cluster-extensions-ca                     Created new CertificateRequest resource "tkg-cluster-extensions-ca-569859115"
4m32s       Normal    Issued                    certificate/tkg-cluster-extensions-ca                     Certificate issued successfully
4m29s       Normal    SuccessfulCreate          machineset/tkg-cluster-workers-b9n7d-69dc849cc4           Created machine "tkg-cluster-workers-b9n7d-69dc849cc4-xx5k9"
4m29s       Normal    SuccessfulCreate          machineset/tkg-cluster-workers-b9n7d-69dc849cc4           Created machine "tkg-cluster-workers-b9n7d-69dc849cc4-hnct5"
4m22s       Warning   ReconcileError            machinehealthcheck/tkg-cluster-workers-b9n7d              error creating client and cache for remote cluster: error fetching REST client config for remote cluster "tkg/tkg-cluster": failed to retrieve kubeconfig secret for Cluster tkg/tkg-cluster: secrets "tkg-cluster-kubeconfig" not found
4m29s       Normal    SuccessfulCreate          machinedeployment/tkg-cluster-workers-b9n7d               Created MachineSet "tkg-cluster-workers-b9n7d-69dc849cc4"
4m17s       Warning   ReconcileError            machinehealthcheck/tkg-cluster-workers-b9n7d              error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "tkg/tkg-cluster": Get https://192.168.220.4:6443/api?timeout=10s: dial tcp 192.168.220.4:6443: connect: connection refused
7s          Warning   ReconcileError            machinehealthcheck/tkg-cluster-workers-b9n7d              error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "tkg/tkg-cluster": Get https://192.168.220.4:6443/api?timeout=10s: dial tcp 192.168.220.4:6443: connect: connection refused
4m27s       Warning   ReconcileFailure          wcpcluster/tkg-cluster                                    unexpected error while reconciling control plane endpoint for tkg-cluster: failed to reconcile loadbalanced endpoint for WCPCluster tkg/tkg-cluster: failed to get control plane endpoint for Cluster tkg/tkg-cluster: VirtualMachineService LB does not yet have VIP assigned: VirtualMachineService LoadBalancer does not have any Ingresses
0s          Warning   ReconcileFailure          wcpmachine/tkg-cluster-control-plane-cmscz-hqttb          vm does not have an IP address: vmware-system-capw-controller-manager/WCPMachine//tkg/tkg-cluster/tkg-cluster-control-plane-cmscz-hqttb

At the end of the deployment you should see a message similar to the following in the events:

0s          Normal    PhaseChanged              tanzukubernetescluster/tkg-cluster                        cluster changes from creating phase to running phase

And you should be able to see that the cluster is provisioned via the kubectl get cluster command.

kubectl get cluster

NAME          PHASE
tkg-cluster   Provisioned

Back on the Namespace page in the vSphere Client, you’ll see that there is 1 Tanzu Kubernetes Cluster present in the Tanzu Kubernetes Grid Service panel.

And drilling down into this cluster you can see some high-level details about it.

Back at the command line, you can get at similar information by looking for a tanzukubernetescluster (tkr) resource.

kubectl get tkc

NAME          CONTROL PLANE   WORKER   DISTRIBUTION                     AGE   PHASE     TKR COMPATIBLE   UPDATES AVAILABLE
tkg-cluster   1               2        v1.19.7+vmware.1-tkg.1.fc82c41   15m   running   True

To get access to the new cluster we’ll need to login again and pass the --tanzu-kubernetes-cluster-namespace and --tanzu-kubernetes-cluster-name parameters.

kubectl vsphere login --server 192.168.220.2 -u administrator@vsphere.local --tanzu-kubernetes-cluster-namespace tkg --tanzu-kubernetes-cluster-name tkg-cluster

Logged in successfully.

You have access to the following contexts:
   192.168.220.2
   tkg
   tkg-cluster

If the context you wish to use is not in this list, you may need to try
logging in again later, or contact your cluster administrator.

To change context, use `kubectl config use-context <workload name>`

There is a new context created (tkg-cluster in this example) that should match the name of the new cluster. You should see that your context has automatically switched to it.

kubectl config get-contexts

CURRENT   NAME            CLUSTER         AUTHINFO                                        NAMESPACE
          192.168.220.2   192.168.220.2   wcp:192.168.220.2:administrator@vsphere.local
          tkg             192.168.220.2   wcp:192.168.220.2:administrator@vsphere.local   tkg
*         tkg-cluster     192.168.220.4   wcp:192.168.220.4:administrator@vsphere.local

You can see that the nodes are present as we expect and the names match was is visible in the vSphere Client.

kubectl get nodes -o wide

NAME                                         STATUS   ROLES    AGE     VERSION            INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                 KERNEL-VERSION       CONTAINER-RUNTIME
tkg-cluster-control-plane-qc26b              Ready    master   14m     v1.19.7+vmware.1   192.168.130.5   <none>        VMware Photon OS/Linux   4.19.160-1.ph3-esx   containerd://1.4.3
tkg-cluster-workers-b9n7d-69dc849cc4-hnct5   Ready    <none>   7m59s   v1.19.7+vmware.1   192.168.130.6   <none>        VMware Photon OS/Linux   4.19.160-1.ph3-esx   containerd://1.4.3
tkg-cluster-workers-b9n7d-69dc849cc4-xx5k9   Ready    <none>   7m39s   v1.19.7+vmware.1   192.168.130.7   <none>        VMware Photon OS/Linux   4.19.160-1.ph3-esx   containerd://1.4.3

In the NSX ALB UI, we can see that a new virtual service has been created for the new cluster. The Address value matches the API endpoint address noted in the vSphere Client and in the kubectl config get-contexts output. This service is only working on port 6443 since we don’t need any GUI access.

And as with the first service created you can see a logical representation of the network flow to the control plane VM.

Deploy an application that uses a Load Balancer service

I’m not showing all the details here but I have created a WordPress application that uses a service of type LoadBalancer to make sure that NSX ALB is capable of providing Load Balancer addresses to my workload services.

apiVersion: v1
kind: Service
metadata:
  name: wordpress
  labels:
    app: wordpress
spec:
  ports:
    - port: 80
  selector:
    app: wordpress
    tier: frontend
  type: LoadBalancer

Checking on the status of the service shows that an IP address of 192.168.220.7 has been allocated, which is in the VIP pool configured in NSX ALB.

kubectl get svc --selector=app=wordpress

NAME              TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)        AGE
wordpress         LoadBalancer   10.97.8.119   192.168.220.6   80:30205/TCP   52s
wordpress-mysql   ClusterIP      None          <none>          3306/TCP       53s

In the NSX ALB UI, we can see that a third service has been configured and had the IP address noted previously and is service port 80.

The tree view shows the path that traffic will flow to the two worker nodes in the cluster where the application is running.

13 thoughts on “Deploying vSphere 7.0 U2 with Tanzu while using NSX Advanced Load Balancer”

  1. Hi Chris,

    i tried to provision a simple webserver app (nginx) in my TKG cluster (with ALB) and found that the pull request wasn’t successful. Did you came across this issue too?
    I don’t know how to login into the TKG worker node. Do you?

    thanks and great article by the way

    Erich

    1. Hello Erich. I did not have any issues with pulling images from the internet…is there anything upstream from your cluster that might be blocking traffic? Any kind of network security policy in place in your cluster that could affect this? If nothing easy presents itself as a solution I would highly recommend opening a support request with VMware.

      Regarding logging into a TKG worker node, https://docs.vmware.com/en/VMware-Tanzu-Kubernetes-Grid/1.3/vmware-tanzu-kubernetes-grid-13/GUID-troubleshooting-tkg-tips.html#connect-to-cluster-nodes-with-ssh-9 should provide the steps you’re looking for.

      1. Hi Chris,

        thanks for your quick answer.

        I was able to login succesfully in Supervisorclusternodes (this is what i did know) and the Nodes from the guest cluster.

        ALB, VCSA, DNS and four vESXi Hosts are running on a pESXi-Cluster and these vLAB-VMs are connected with some NSX-v segments (this is my way to force me to still use NSX-v ). So far i did not have problems with this approach. I do this since more than 5 years and with more than 10 vLAB-environments.

        Inside this vLAB (which i deployed according to and with some parts of the scripts from William Lam) i do use (obviously) dvSwitches.

        The SupervisorControlVMs are able to ping external addresses. The traffic is leaving eth0 and is routed externally (just checked with pktcap-uw).
        The Kubernetes Cluster Nodes are not able to ping external addresses (vNIC0 -> Workload Segment).

        at present i guess it’s anything related to my NSX-v configuration, because i see icmp requests on my transit network (replies are missing)

        This is nothing which i could ask VMware Support honestly

        Will report if i’m successful.

      2. Hi Chris,

        thanks for your quick answer.

        If you’re interested to know what i’m doing i need to use a different communication format (for screenshots …).

        In short:
        this environment runs in one of my vLabs (hosted on pESXi-Cluster using NSX-v as segmentation concept)
        was pretty good so far, but it’s not that easy, especially when you forget to program the reverse route path (this is what was wrong)
        Now i’m able to ping external addresses from my worker node and will see if my kubernetes app can be pulled

        Erich

  2. Hi Chris,

    Thanks for the post. Your screenshots are helping me to troubleshoot an issue getting this running in my lab.

    How does your NSX ALB chose the VIP address 192.168.220.2 as the frontend for your supervisor cluster? I don’t see that address or subnet as an input in any of your configuration (perhaps I’m missing it).

    My supervisor cluster and service engine VIP both come up with the same IP and network connectivity to the supervisor cluster is therefore intermittent.

    “Once the deployment is complete, the Control Plane Node IP Address will change to a VIP supplied by NSX ALB.”

    This never happens in my environment.

    Any ideas?

    1. Hello Mark. You’ll need to look at my previous post, Deploying NSX Advanced Load Balancer for use with Tanzu Kubernetes Grid and vSphere with Tanzu to see where that IP came from. I have specified a block of VIP IP address as 192.168.220.0/23 with a gateway address of 192.168.220.1 and a usable range of 192.168.220.2-192.168.220.127 (named K8s-Frontend). I’m using this block for VIPs and it’s where the Supervisor Cluster is pulling an IP from.

  3. Thank you so much for the guide, i followed the entire guide, except switching the AVI LB to the essentials edition and i use a self-signed certificate on the AVI with its IP as SAN.
    But when i deploy a new workload cluster with the YAML file specified the workers are not created. The cluster name is there but the worker nodes are not provisioned, any ideas?

    1. error creating client and cache for remote cluster: error fetching REST client config for remote cluster “tkg/sitecore-dev”: failed to retrieve kubeconfig secret for Cluster tkg/sitecore-dev: secrets “sitecore-dev-kubeconfig” not found

    2. We’d very likely have to dig into the logs/events to see what is going on there. I would highly recommend getting a support request opened with VMware to better assist you.

  4. Hi Chris,

    Could you do a guide on how to enable Windows container support with a supervisor cluster? I would like to run Windows containers also.

    1. I don’t believe that Windows containers are supported in vSphere with Tanzu (yet). They are supported in Tanzu Kubernetes Grid Integrated edition (TKGI) 1.9 and above support Windows workers (https://docs.pivotal.io/tkgi/1-12/support-windows-index.html). With regards to TKG, you may want to register for the Modernize Windows Apps: Introduction to Windows Containers on Kubernetes [APP1999] VMworld session (https://myevents.vmware.com/widget/vmware/vmworld2021/catalog?search.passtype=15931978752250014Azc&search.level=1517937137830003aCWu&search.product=1617723187121049eJ7h&search.track=contentTrack_applicationModernization&search=APP1999%5D&tab.contentcatalogtabs=1627421929827001vRXW).

  5. Pingback: Revisiting installing vSphere with Tanzu while using NSX Advanced Load Balancer – Little Stuff

Leave a Reply to Chris Little Cancel Reply

Your email address will not be published. Required fields are marked *