Installing a TKG 1.4 management cluster on vSphere with NSX Advanced Load Balancer

The process of deploying a TKG 1.4 management cluster is largely unchanged from my previous post, Installing Tanzu Kubernetes Grid 1.3 on vSphere with NSX Advanced Load Balancer, but there are some differences worth calling out. I’ll walk through the entire process again in this post and pay special attention to new and changed portions.

On a somewhat bittersweet note, this may be my last post on VMware’s Tanzu line of products for some time…I’ll be helping VMware to provide world-class support for a different set of products going forward. I hope to continue this blog with posts about whatever new endeavors the future holds for me.

Download updated components

You can download updated tanzu CLI binaries, node OS OVA files, signed kubectlcrashd and velero binaries from

Deploy a Kubernetes node OVA

While Photon OS and Ubuntu are both supported and VMware still supplies OVAs for both, Ubuntu is now the default node OS. While creating the management cluster, this is not such a large concern since you have to explicitly call out which VM template you’re using. However, when deploying a workload cluster, there is no such requirement and if you have both Photon OS and Ubuntu VM templates present in your vSphere inventory, you will end up with Kubernetes nodes running Ubuntu.

For this exercise, I switched over to using the VMware-supplied Ubuntu OVA. The process of deploying the OVA is very straightforward and doesn’t differ from my previous post.

Once the deployment is finished, I always take a snapshot so that I can make use of the linkedClone functionality when standing up the node VMs (to help reduce storage consumption).

The last step is to convert the VM to a template:

Install the 1.4 tanzu CLI binary and additional executables

Note: If you are on a system that has the 1.3 tanzu CLI binary installed, see the Install the 1.4 tanzu CLI binary and additional executables section of my earlier post, Upgrading from TKG 1.3 to 1.4 (including extensions) on vSphere, as there are a few extra steps needed when upgrading the tanzu cli.

After downloading the updated tanzu CLI binary (tar/gz format), you will need to extract and install it on your local system:

tar -xvf tanzu-cli-bundle-linux-amd64.tar

cd cli/

ubuntu@cli-vm:~/cli$ ls
cluster                                 kubernetes-release  pinniped-auth
core                                    login               vendir-linux-amd64-v0.21.1+vmware.1.gz
imgpkg-linux-amd64-v0.10.0+vmware.1.gz  management-cluster  ytt-linux-amd64-v0.34.0+vmware.1.gz
kapp-linux-amd64-v0.37.0+vmware.1.gz    manifest.yaml
kbld-linux-amd64-v0.30.0+vmware.1.gz    package
sudo install core/v1.4.0/tanzu-core-linux_amd64 /usr/local/bin/tanzu

You can now check to see the tanzu CLI binary version:

tanzu version

version: v1.4.0
buildDate: 2021-08-30
sha: c9929b8f

You can follow a similar process to gunzip and install the other utilities located in the cli folder (imgpkg, kapp, kbld, vendir, ytt):

for file in $(ls *.gz)
gunzip $file
sudo install ${file::-3} /usr/local/bin/$(echo $file | awk -F - '{print $1}')

Lastly, you can follow a similar process to install or update the kubectl binary to the latest version:

gunzip kubectl-linux-v1.21.2+vmware.1.gz

sudo install kubectl-linux-v1.21.2+vmware.1 /usr/local/bin/kubectl

kubectl version --client

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2+vmware.1", GitCommit:"54e7e68e30dd3f9f7bb4f814c9d112f54f0fb273", GitTreeState:"clean", BuildDate:"2021-06-28T22:17:36Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

Install the tanzu CLI plugins

Note: This next command needs to be run from the cli directory noted previously.

tanzu plugin install --local . all
tanzu plugin list

  NAME                LATEST VERSION  DESCRIPTION                                                        REPOSITORY  VERSION  STATUS
  alpha               v1.4.0          Alpha CLI commands                                                 core                 not installed
  cluster             v1.4.0          Kubernetes cluster operations                                      core        v1.4.0   installed
  kubernetes-release  v1.4.0          Kubernetes release operations                                      core        v1.4.0   installed
  login               v1.4.0          Login to the platform                                              core        v1.4.0   installed
  management-cluster  v1.4.0          Kubernetes management cluster operations                           core        v1.4.0   installed
  package             v1.4.0          Tanzu package management                                           core        v1.4.0   installed
  pinniped-auth       v1.4.0          Pinniped authentication operations (usually not directly invoked)  core        v1.4.0   installed

Create a management cluster via the UI

The process for building out the management cluster is also fairly similar but you’ll see that there are some new configuration options present.

tanzu management-cluster create --ui

Validating the pre-requisites...
Serving kickstart UI at

A browser should be launched automatically and you can choose your platform for deploying TKG (vSphere in this example).

Enter the information necessary to connect to your vCenter Server and press the Connect button.

You’ll see a message similar to the following…press Continue if it looks good.

And another message, this time wanting you to confirm that you’re deploying a TKG management cluster and not trying to stand up vSphere with Tanzu.

Select an appropriate Datacenter and paste in a public key that can be used to validate SSH clients to the Kubernetes nodes.

You have a lot of choices here but I’m going with a simple and small Development cluster. This will result in a single control plane node and a single worker node. I’m also disabling Machine Health Checks (MHC) in my lab as the nested environment has very poor performance and I don’t want nodes to get recreated unnecessarily.

You’ll see that there was an option for Control Plane Endpoint Provider with Kube-vip and NSX Advanced Load Balancer (NSX ALB) choices. Prior to version 1.4, Kube-VIP was the only option and there was no choice presented. You can now use NSX ALB for both workload load balancer services and Kubernetes control plane load balanced endpoint addresses. You can read a little more about this in my earlier post, Migrating a TKG cluster control-plane endpoint from kube-vip to NSX-ALB. I’m going to use NSX ALB in this lab but I also want to specify an IP address instead of just letting NSX ALB pull one out of it’s pool. To make this work, I’ll need to make a change in NSX ALB.

From the Infrastructure, Networks page in the NSX ALB UI, you’ll need to click the Edit icon for the network where you want your control plane endpoint IP address to live (K8s-Frontend in my case).

Click the Edit button next to the appropriate network ( in this case).

Click the Add Static IP Address Pool button and enter the desired control plane endpoint IP address as a range ( in this example).

Click the Save button.

Click the Save button again. You should see a summary of the networks and can see that there is now an additional Static IP Pool configured on the desired network (K8s-Frontend has two now).

Back in the TKG installer UI, I can configure the desired IP address and move on to the next page.

On the VMware NSX Advanced Load Balancer page, you now have to supply the CA certificate used by NSX ALB prior to clicking the Verify Credentials button. This used to be supplied after verifying credentials.

Another change you’ll see is that you now get to choose different VIP networks for Management VIPs (control plane endpoints) and Workload VIPs (workload load balancer services). I’m using the same network for both. These are all dropdowns where you can select the items that have been configured in NSX ALB.

For comparison, the following is what this page looked like in TKG 1.3:

You can skip the Optional Metadata page unless you want to provide custom labels.

Choose an appropriate VM folder, Datastore and Cluster.

Choose an appropriate portgroup for your Kubernetes node VMs. You can leave the service and pod CIDRs as is or update them as needed. You can’t tell from this screenshot but my network name is K8s-Workload. If you are behind a proxy, toggle the Enable Proxy Settings switch and enter any appropriate proxy information.

Configuring Identity Management is the same as in TKG 1.3 and you can use what I’ve configured here as a primer for how to configure Active Directory integration but will need to customize each parameter for your deployment.

There is one welcome addition and that is the ability to test your LDAP configuration, via the Verify LDAP Configuration button

I’ve entered a user and group that were created in Active Directory and will be used later to access the cluster.

That was not exactly what I was hoping to see here. It looks like there is an issue with user search being hardcoded to cn and groupsearch being hardcoded to ou in this Verify LDAP Configuration utility. Since my users can be found via cn, user search succeeded but my groups cannot be found by ou, hence the failure here. Regardless of success or failure in this utility, it will not prevent you from proceeding with the installation, which I did.

If you properly uploaded an OVA and converted it to a template, you should be able to select it (or from one of several).

The ability to register your management cluster to Tanzu Mission Control (TMC) during installation was added in 1.3 but there is an issue with TMC not working well with TKG 1.4 right now. Do not attempt to register your management cluster with TMC until the TMC Release Notes mention support for TKG 1.4 management clusters.

Barring any internal policies prohibiting it, you should always participate in the Customer Experience Improvement Program.

You can click the Review Configuration button and take some time to look over how you’ll be deploying your management cluster to ensure it is what you want.

If everything looks good, click the Deploy Management Cluster button.

You might have noticed in the previous screenshot that there was a file referenced, /home/ubuntu/.tanzu/tkg/clusterconfigs/6ezabehzyd.yaml, that we didn’t create. The installer actually took everything we entered in the UI and saved it. The really nice thing about this is that you can quickly create other management clusters (or recreate this one if you decide to destroy it) from this same file 

AVI_CLOUD_NAME: Default-Cloud
AVI_CONTROLLER: nsxalb-cluster.corp.tanzu
AVI_ENABLE: "true"
CLUSTER_NAME: tkg-mgmt
ENABLE_MHC: "false"
LDAP_BIND_DN: cn=Administrator,cn=Users,dc=corp,dc=tanzu
LDAP_GROUP_SEARCH_BASE_DN: cn=Users,dc=corp,dc=tanzu
LDAP_GROUP_SEARCH_FILTER: (objectClass=group)
LDAP_HOST: controlcenter.corp.tanzu:636
LDAP_USER_SEARCH_BASE_DN: cn=Users,dc=corp,dc=tanzu
LDAP_USER_SEARCH_FILTER: (objectClass=Person)
OS_ARCH: amd64
OS_NAME: ubuntu
OS_VERSION: "20.04"
VSPHERE_DATASTORE: /RegionA01/datastore/map-vol
VSPHERE_NETWORK: /RegionA01/network/K8s-Workload
VSPHERE_RESOURCE_POOL: /RegionA01/host/RegionA01-MGMT/Resources
VSPHERE_SERVER: vcsa-01a.corp.tanzu
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC5KYNeWQgVHrDHaEhBCLF1vIR0OAtUIJwjKYkY4E/5HhEu8fPFvBOIHPFTPrtkX4vzSiMFKE5WheKGQIpW3HHlRbmRPc9oe6nNKlsUfFAaJ7OKF146Gjpb7lWs/C34mjdtxSb1D/YcHSyqK5mxhyHAXPeK8lrxG5MLOJ3X2A3iUvXcBo1NdhRdLRWQmyjs16fnPx6840x9n5NqeiukFYIVhDMFErq42AkeewsWcbZQuwViSLk2cIc09eykAjaXMojCmSbjrj0kC3sbYX+HD2OWbKohTqqO6/UABtjYgTjIS4PqsXWk63dFdcxF6ukuO6ZHaiY7h3xX2rTg9pv1oT8WBR44TYgvyRp0Bhe0u2/n/PUTRfp22cOWTA2wG955g7jOd7RVGhtMHi9gFXeUS2KodO6C4XEXC7Y2qp9p9ARlNvu11QoaDyH3l0h57Me9we+3XQNuteV69TYrJnlgWecMa/x+rcaEkgr7LD61dY9sTuufttLBP2ro4EIWoBY6F1Ozvcp8lcgi/55uUGxwiKDA6gQ+UA/xtrKk60s6MvYMzOxJiUQbWYr3MJ3NSz6PJVXMvlsAac6U+vX4U9eJP6/C1YDyBaiT96cb/B9TkvpLrhPwqMZdYVomVHsdY7YriJB93MRinKaDJor1aIE/HMsMpbgFCNA7mma9x5HS/57Imw== admin@corp.local
VSPHERE_TLS_THUMBPRINT: 01:8D:8B:7F:13:3A:B9:C6:90:D2:5F:17:AD:EB:AC:78:26:3C:45:FB
VSPHERE_USERNAME: administrator@vsphere.local

You’ll be able to follow the installation process at a high level in the UI. A welcome addition to the installer UI in 1.4 is that each step is clearly called out on the left side of the page:

What’s happening at this point is that the system where you launched the installer from has downloaded a kind image and it should now be running as a container.

docker ps

CONTAINER ID        IMAGE                                                         COMMAND                  CREATED             STATUS              PORTS                       NAMES
76f29df864d7   "/usr/local/bin/entrâ¦"   20 seconds ago      Up 7 seconds>6443/tcp   tkg-kind-c5i647s09c6oq8590i10-control-plane

We can get access to this cluster via a temporary kubeconfig file that is created under .kube-tkg/tmp.

kubectl --kubeconfig=.kube-tkg/tmp/config_XXXYlv2Y get nodes

NAME                                          STATUS   ROLES                  AGE   VERSION
tkg-kind-c5i647s09c6oq8590i10-control-plane   Ready    control-plane,master   39s   v1.21.2+vmware.1-360497810732255795

You can see that it’s still coming up at this point.

kubectl --kubeconfig=.kube-tkg/tmp/config_XXXYlv2Y get po -A

NAMESPACE             NAME                                                                  READY   STATUS              RESTARTS   AGE
capi-system           capi-controller-manager-778bd4dfb9-rtf2g                              0/2     ContainerCreating   0          0s
capi-webhook-system   capi-controller-manager-9995bdc94-tbtp9                               0/2     ContainerCreating   0          3s
cert-manager          cert-manager-77f6fb8fd5-t9g6x                                         1/1     Running             0          28s
cert-manager          cert-manager-cainjector-6bd4cff7bb-62k2s                              1/1     Running             0          28s
cert-manager          cert-manager-webhook-fbfcb9d6c-bfpwz                                  1/1     Running             0          27s
kube-system           coredns-8dcb5c56b-5xqht                                               1/1     Running             0          51s
kube-system           coredns-8dcb5c56b-kzvdg                                               1/1     Running             0          51s
kube-system           etcd-tkg-kind-c5i647s09c6oq8590i10-control-plane                      1/1     Running             0          66s
kube-system           kindnet-fk6lg                                                         1/1     Running             0          51s
kube-system           kube-apiserver-tkg-kind-c5i647s09c6oq8590i10-control-plane            1/1     Running             0          57s
kube-system           kube-controller-manager-tkg-kind-c5i647s09c6oq8590i10-control-plane   1/1     Running             0          57s
kube-system           kube-proxy-k6m7h                                                      1/1     Running             0          51s
kube-system           kube-scheduler-tkg-kind-c5i647s09c6oq8590i10-control-plane            1/1     Running             0          57s
local-path-storage    local-path-provisioner-8b46957d4-s7v28                                1/1     Running             0          51s

And once the pods in the bootstrap cluster are fully running, we can examine the logs in the capv-controller-manager pod to get a more detailed view of what’s happening. I like to stream these logs during installation to make sure nothing looks out of the ordinary.

kubectl --kubeconfig=.kube-tkg/tmp/config_XXXYlv2Y -n capv-system logs capv-controller-manager-587fbf697f-2fcg8 manager -f

I1011 16:14:20.351370       1 vspheremachine_controller.go:329] capv-controller-manager/vspheremachine-controller/tkg-system/tkg-mgmt-worker-x4sh5 "msg"="Waiting for the control plane to be initialized"
I1011 16:14:20.356408       1 controller.go:281] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="vspheremachine" "name"="tkg-mgmt-worker-x4sh5" "namespace"="tkg-system"
I1011 16:14:20.415855       1 vspheremachine_controller.go:329] capv-controller-manager/vspheremachine-controller/tkg-system/tkg-mgmt-worker-x4sh5 "msg"="Waiting for the control plane to be initialized"
I1011 16:14:20.416776       1 controller.go:281] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="vspheremachine" "name"="tkg-mgmt-worker-x4sh5" "namespace"="tkg-system"
I1011 16:14:20.665596       1 vspheremachine_controller.go:329] capv-controller-manager/vspheremachine-controller/tkg-system/tkg-mgmt-worker-x4sh5 "msg"="Waiting for the control plane to be initialized"
I1011 16:14:20.667597       1 controller.go:281] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="vspheremachine" "name"="tkg-mgmt-worker-x4sh5" "namespace"="tkg-system"
I1011 16:14:24.178944       1 vspherecluster_controller.go:285] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="Reconciling VSphereCluster"
I1011 16:14:24.321884       1 vspherecluster_controller.go:445] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="skipping load balancer reconciliation"  "reason"="VSphereCluster.Spec.LoadBalancerRef is nil"
I1011 16:14:24.321926       1 vspherecluster_controller.go:644] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="skipping reconcile when API server is online"  "reason"="alreadyPolling"
I1011 16:14:24.321982       1 vspherecluster_controller.go:334] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="control plane endpoint is not reconciled"
I1011 16:14:24.323216       1 controller.go:281] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="vspherecluster" "name"="tkg-mgmt" "namespace"="tkg-system"
I1011 16:14:27.515259       1 vspherecluster_controller.go:285] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="Reconciling VSphereCluster"
I1011 16:14:27.545140       1 vspherecluster_controller.go:445] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="skipping load balancer reconciliation"  "reason"="VSphereCluster.Spec.LoadBalancerRef is nil"
I1011 16:14:27.548850       1 vspherecluster_controller.go:644] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="skipping reconcile when API server is online"  "reason"="alreadyPolling"
I1011 16:14:27.548992       1 vspherecluster_controller.go:334] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="control plane endpoint is not reconciled"
I1011 16:14:27.550613       1 controller.go:281] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="vspherecluster" "name"="tkg-mgmt" "namespace"="tkg-system"

Back in the UI we can see that the process has moved on to actually creating the management cluster.

In the vSphere UI, you should see the Ubuntu template getting cloned:

And a VM getting created whose name starts with the management cluster name (tkg-mgmt in this example):

In the NSX ALB UI, there should be a virtual service configured that will be the load balanced endpoint for the control plane.

Unless you already had any service engines deployed, the way this virtual service looks is normal as they are not running yet. Also, with the control plane VM not powered on, it does not have an IP address so the virtual service is not even fully configured.

You can navigate to the Infrastructure page to see that two service engines are being provisioned.

You’ll also see these being provisioned in the vSphere Client at this point.

After a short time, the first control plane VM should be powered on and have an IP address ( in this example).

The virtual service in NSX ALB is a little bit further along as well. Both service engines are powered on, an IP has been assigned and we can see that the service will point to the first control plane node on port 6443:

Just a few minutes later, after kubeadm has run in the control plane node and the requisite Kubernetes processes are available, the service is in a running state:

The installer UI shows that the process has moved on to the next step.

We can see in the vSphere UI that a worker node has been configured and is powered on (with an IP address of in this example).

You should now have a valid kubeconfig file at ~/.kube/config and can start to inspect the cluster.

kubectl get nodes

NAME                             STATUS   ROLES                  AGE   VERSION
tkg-mgmt-control-plane-8qjmd     Ready    control-plane,master   10m   v1.21.2+vmware.1
tkg-mgmt-md-0-559c48d65d-d7wrr   Ready    <none>                 40s   v1.21.2+vmware.1
kubectl get po -A

NAMESPACE      NAME                                                     READY   STATUS              RESTARTS   AGE
cert-manager   cert-manager-77f6fb8fd5-g744d                            0/1     Pending             0          12m
cert-manager   cert-manager-cainjector-6bd4cff7bb-96j9r                 0/1     Pending             0          12m
cert-manager   cert-manager-webhook-fbfcb9d6c-qrczs                     0/1     Pending             0          12m
kube-system    antrea-agent-9km2d                                       2/2     Running             0          4m32s
kube-system    antrea-agent-hvwb7                                       2/2     Running             6          14m
kube-system    antrea-controller-86f8988c5f-6wxxl                       0/1     Running             0          16m
kube-system    coredns-8dcb5c56b-mhpn4                                  0/1     Pending             0          14m
kube-system    coredns-8dcb5c56b-qmrng                                  0/1     Pending             0          14m
kube-system    etcd-tkg-mgmt-control-plane-8qjmd                        1/1     Running             0          14m
kube-system    kube-apiserver-tkg-mgmt-control-plane-8qjmd              1/1     Running             0          14m
kube-system    kube-controller-manager-tkg-mgmt-control-plane-8qjmd     1/1     Running             0          14m
kube-system    kube-proxy-rhbxq                                         1/1     Running             0          14m
kube-system    kube-proxy-t2j6x                                         1/1     Running             0          4m32s
kube-system    kube-scheduler-tkg-mgmt-control-plane-8qjmd              1/1     Running             0          14m
kube-system    vsphere-cloud-controller-manager-sps2q                   1/1     Running             7          14m
tkg-system     kapp-controller-6499b8866-m5g62                          1/1     Running             6          16m
tkg-system     tanzu-addons-controller-manager-7b4c4b6957-l5mrr         0/1     ContainerCreating   0          73s
tkg-system     tanzu-capabilities-controller-manager-6ff97656b8-r2wzn   0/1     ContainerCreating   0          17m
tkr-system     tkr-controller-manager-6bc455b5d4-hrlnq                  0/1     Pending             0          17m

The system is still coming up but after just a few more minutes, the installer UI will show that the process is completed.

From the command line, we can start to inspect our new management cluster

tanzu management-cluster get

  tkg-mgmt  tkg-system  running  1/1           1/1      v1.21.2+vmware.1  management


NAME                                                         READY  SEVERITY  REASON                           SINCE  MESSAGE 
/tkg-mgmt                                                    True                                              9m23s          
ââClusterInfrastructure - VSphereCluster/tkg-mgmt            True                                              9m34s          
ââControlPlane - KubeadmControlPlane/tkg-mgmt-control-plane  True                                              9m23s          
â ââMachine/tkg-mgmt-control-plane-8qjmd                     True                                              9m26s          
    ââMachine/tkg-mgmt-md-0-559c48d65d-d7wrr                 True                                              9m32s


  NAMESPACE                          NAME                    TYPE                    PROVIDERNAME  VERSION  WATCHNAMESPACE
  capi-kubeadm-bootstrap-system      bootstrap-kubeadm       BootstrapProvider       kubeadm       v0.3.23
  capi-kubeadm-control-plane-system  control-plane-kubeadm   ControlPlaneProvider    kubeadm       v0.3.23
  capi-system                        cluster-api             CoreProvider            cluster-api   v0.3.23
  capv-system                        infrastructure-vsphere  InfrastructureProvider  vsphere       v0.7.10

In my previous post, Upgrading from TKG 1.3 to 1.4 (including extensions) on vSphere, I discussed the new package framework and you can see here as well that much of the core functionality is moved into packages. This will make the lifecycle management of these pieces much easier going forward.

tanzu package installed list -A

- Retrieving installed packages...
  NAME                               PACKAGE-NAME                                        PACKAGE-VERSION  STATUS               NAMESPACE
  ako-operator                                                     Reconcile succeeded  tkg-system
  antrea                                                                 Reconcile succeeded  tkg-system
  load-balancer-and-ingress-service                   Reconcile succeeded  tkg-system
  metrics-server                                                 Reconcile succeeded  tkg-system
  pinniped                                                             Reconcile succeeded  tkg-system
  tanzu-addons-manager                                           Reconcile succeeded  tkg-system
  vsphere-cpi                                                       Reconcile succeeded  tkg-system
  vsphere-csi                                                       Reconcile succeeded  tkg-system
kubectl get packageinstalls -A

NAMESPACE    NAME                                PACKAGE NAME                                         PACKAGE VERSION         DESCRIPTION           AGE
tkg-system   ako-operator                                      1.4.0+vmware.1-tkg.1    Reconcile succeeded   24h
tkg-system   antrea                                                  0.13.3+vmware.1-tkg.1   Reconcile succeeded   24h
tkg-system   load-balancer-and-ingress-service   1.4.3+vmware.1-tkg.1    Reconcile succeeded   24h
tkg-system   metrics-server                                  0.4.0+vmware.1-tkg.1    Reconcile succeeded   24h
tkg-system   pinniped                                              0.4.4+vmware.1-tkg.1    Reconcile succeeded   24h
tkg-system   tanzu-addons-manager                            1.4.0+vmware.1-tkg.1    Reconcile succeeded   24h
tkg-system   vsphere-cpi                                        1.21.0+vmware.1-tkg.1   Reconcile succeeded   24h
tkg-system   vsphere-csi                                        2.3.0+vmware.1-tkg.2    Reconcile succeeded   24h

Get a kubeconfig file for an LDAP user.

You can see that we have a new kubectl context created and set as current.

kubectl config get-contexts

CURRENT   NAME                      CLUSTER    AUTHINFO         NAMESPACE
*         tkg-mgmt-admin@tkg-mgmt   tkg-mgmt   tkg-mgmt-admin

That context is what is known as an “admin” context and does not use LDAP for authentication. In order to provide a kubeconfig file to a an LDAP user, we run the following:

tanzu management-cluster kubeconfig get --export-file /tmp/ldaps-tkg-mgmt-kubeconfig

You can now access the cluster by specifying '--kubeconfig /tmp/ldaps-tkg-mgmt-kubeconfig' flag when using `kubectl` command

From here, we can deliver that kubeconfig file (/tmp/ldaps-tkg-mgmt-kubeconfig) to any user and they can use it to work with the cluster.

kubectl --kubeconfig=/tmp/ldaps-tkg-mgmt-kubeconfig get nodes

Error: could not complete Pinniped login: could not perform OIDC discovery for "": Get "": dial tcp connect: connection refused
Error: pinniped-auth login failed: exit status 1
Error: exit status 1
Unable to connect to the server: getting credentials: exec: executable tanzu failed with exit code 1

Uh oh. That doesn’t look good. It turns out that this is a known issue when you’re using NSX ALB to provide your control plane endpoint addresses, as noted in Add a Load Balancer for an Identity Provider on vSphere. The problem is that pinniped is trying to use the cluster’s control plane endpoint address ( in this example) but the load balancer service is only listening on ports 6443 and 443 (not 31234 as noted in the error). We’ll be creating two new load balancer services to fix this.

The first step is to create an overlay file that will modify the pinniped-supervisor and dex services to be of type LoadBalancer:

#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind": "Service", "metadata": {"name": "pinniped-supervisor", "namespace": "pinniped-supervisor"}})
  type: LoadBalancer
    app: pinniped-supervisor
    - name: https
      protocol: TCP
      port: 443
      targetPort: 8443

#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind": "Service", "metadata": {"name": "dexsvc", "namespace": "tanzu-system-auth"}}), missing_ok=True
  type: LoadBalancer
    app: dex
    - name: dex
      protocol: TCP
      port: 443
      targetPort: https

You can see that the current services are of type NodePort:

kubectl get svc -A |egrep "pinniped-supervisor|dex"

pinniped-supervisor   NodePort   <none>        443:31234/TCP   23m
dex                   NodePort   <none>        443:30133/TCP   23m

Next, the contents of the overlay file need to be base64 encoded for inclusion in a secret:

cat pinniped-supervisor-svc-overlay.yaml | base64 -w 0


Issue a command similar to the following to update the tkg-mgmt-pinniped-addon secret

kubectl -n tkg-system patch secret tkg-mgmt-pinniped-addon -p '{"data": {"overlays.yaml": "I0AgbG9hZCgiQHl0dDpvdmVybGF5IiwgIm92ZXJsYXkiKQojQG92ZXJsYXkvbWF0Y2ggYnk9b3ZlcmxheS5zdWJzZXQoeyJraW5kIjogIlNlcnZpY2UiLCAibWV0YWRhdGEiOiB7Im5hbWUiOiAicGlubmlwZWQtc3VwZXJ2aXNvciIsICJuYW1lc3BhY2UiOiAicGlubmlwZWQtc3VwZXJ2aXNvciJ9fSkKLS0tCiNAb3ZlcmxheS9yZXBsYWNlCnNwZWM6CiAgdHlwZTogTG9hZEJhbGFuY2VyCiAgc2VsZWN0b3I6CiAgICBhcHA6IHBpbm5pcGVkLXN1cGVydmlzb3IKICBwb3J0czoKICAgIC0gbmFtZTogaHR0cHMKICAgICAgcHJvdG9jb2w6IFRDUAogICAgICBwb3J0OiA0NDMKICAgICAgdGFyZ2V0UG9ydDogODQ0MwoKI0AgbG9hZCgiQHl0dDpvdmVybGF5IiwgIm92ZXJsYXkiKQojQG92ZXJsYXkvbWF0Y2ggYnk9b3ZlcmxheS5zdWJzZXQoeyJraW5kIjogIlNlcnZpY2UiLCAibWV0YWRhdGEiOiB7Im5hbWUiOiAiZGV4c3ZjIiwgIm5hbWVzcGFjZSI6ICJ0YW56dS1zeXN0ZW0tYXV0aCJ9fSksIG1pc3Npbmdfb2s9VHJ1ZQotLS0KI0BvdmVybGF5L3JlcGxhY2UKc3BlYzoKICB0eXBlOiBMb2FkQmFsYW5jZXIKICBzZWxlY3RvcjoKICAgIGFwcDogZGV4CiAgcG9ydHM6CiAgICAtIG5hbWU6IGRleAogICAgICBwcm90b2NvbDogVENQCiAgICAgIHBvcnQ6IDQ0MwogICAgICB0YXJnZXRQb3J0OiBodHRwcwo="}}'

secret/tkg-mgmt-pinniped-addon patched

Since pinniped (with dex included) is now a package, the reconciliation process should kick off within five minutes and you should see the service updated to type LoadBalancer:

kubectl get svc -A |egrep "pinniped-supervisor|dex"

pinniped-supervisor                 pinniped-supervisor                                             LoadBalancer     443:31728/TCP            24m
tanzu-system-auth                   dexsvc                                                          LoadBalancer     443:30133/TCP            24m

In the NSX ALB UI, you will see two new services created

To complete the process, the pinniped-post-deploy-job job needs to be deleted so that it will automatically be recreated and re-run, finishing this process:

kubectl delete jobs pinniped-post-deploy-job -n pinniped-supervisor

job.batch "pinniped-post-deploy-job" deleted
kubectl get job pinniped-post-deploy-job -n pinniped-supervisor

Error from server (NotFound): jobs.batch "pinniped-post-deploy-job" not found

After a few minutes…

kubectl get job pinniped-post-deploy-job -n pinniped-supervisor

NAME                       COMPLETIONS   DURATION   AGE
pinniped-post-deploy-job   1/1           11s        39s

And you can now try running that same kubectl command using the LDAP user’s kubeconfig file:

kubectl --kubeconfig=/tmp/ldaps-tkg-mgmt-kubeconfig get nodes

Error: could not complete Pinniped login: could not perform OIDC discovery for "": Get "": dial tcp connect: connection refused
Error: pinniped-auth login failed: exit status 1
Error: exit status 1
Unable to connect to the server: getting credentials: exec: executable tanzu failed with exit code 1

Uh oh again! This was easy to resolve though as the problem was that I needed to redownload that kubeconfig file after making the updates to pinniped:

tanzu management-cluster kubeconfig get --export-file /tmp/ldaps-tkg-mgmt-kubeconfig

You can now access the cluster by specifying '--kubeconfig /tmp/ldaps-tkg-mgmt-kubeconfig' flag when using `kubectl` command

Hopefully, only one more time…

kubectl --kubeconfig=/tmp/ldaps-tkg-mgmt-kubeconfig get nodes


Click Advanced and then the Proceed to (unsafe) link

Click Advanced and then the Proceed to (unsafe) link. If you were paying close attention to the last two pages, you’ll note that the first ( was the load balancer address for the pinniped-supervisor service and the second (1921.68.220.3) was the load balancer address for the dex service. Dex is only called when you’re performing LDAP authentication…if this were OIDC, only one page would have been presented.

You can enter your LDAP credentials here (I’m using the tkgadmin account I mentioned when testing the LDAP configuration).

And if it succeeds, you’ll see a message similar to the following:

kubectl --kubeconfig=/tmp/ldaps-tkg-mgmt-kubeconfig get nodes

Error from server (Forbidden): nodes is forbidden: User "tkgadmin@corp.tanzu" cannot list resource "nodes" in API group "" at the cluster scope

Still having problems but this one is familiar as I saw it in TKG 1.3 (and 1.2) but forgot about it (again). Since the tanzuadmins group has no privileges in the cluster, we need to create a clusterrolebinding.

kind: ClusterRoleBinding
  name: tanzuadmins
  - kind: Group
    name: tanzuadmins
  kind: ClusterRole #this must be Role or ClusterRole
  name: cluster-admin # this must match the name of the Role or ClusterRole you wish to bind to
kubectl apply -f tanzuadmin-clusterrolebinding.yaml created

Now the user is able to login and work with cluster-admin privileges (or lower if you use a different clusterrolebinding)

kubectl --kubeconfig=/tmp/ldaps-tkg-mgmt-kubeconfig get nodes
NAME                             STATUS   ROLES                  AGE   VERSION
tkg-mgmt-control-plane-8qjmd     Ready    control-plane,master   64m   v1.21.2+vmware.1
tkg-mgmt-md-0-559c48d65d-d7wrr   Ready    <none>                 54m   v1.21.2+vmware.1

Be sure to check out my post, Using multiple availability zones for a workload cluster in TKG 1.4 on vSphere, which builds upon this recently deployed management cluster.

Leave a Comment

Your email address will not be published.