The process of deploying a TKG 1.4 management cluster is largely unchanged from my previous post, Installing Tanzu Kubernetes Grid 1.3 on vSphere with NSX Advanced Load Balancer, but there are some differences worth calling out. I’ll walk through the entire process again in this post and pay special attention to new and changed portions.
On a somewhat bittersweet note, this may be my last post on VMware’s Tanzu line of products for some time…I’ll be helping VMware to provide world-class support for a different set of products going forward. I hope to continue this blog with posts about whatever new endeavors the future holds for me.
Table of Contents
Download updated components
You can download updated tanzu
CLI binaries, node OS OVA files, signed kubectl
, crashd
and velero
binaries from customerconnect.vmware.com.
Deploy a Kubernetes node OVA
While Photon OS and Ubuntu are both supported and VMware still supplies OVAs for both, Ubuntu is now the default node OS. While creating the management cluster, this is not such a large concern since you have to explicitly call out which VM template you’re using. However, when deploying a workload cluster, there is no such requirement and if you have both Photon OS and Ubuntu VM templates present in your vSphere inventory, you will end up with Kubernetes nodes running Ubuntu.
For this exercise, I switched over to using the VMware-supplied Ubuntu OVA. The process of deploying the OVA is very straightforward and doesn’t differ from my previous post.









Once the deployment is finished, I always take a snapshot so that I can make use of the linkedClone functionality when standing up the node VMs (to help reduce storage consumption).


The last step is to convert the VM to a template:


Install the 1.4 tanzu CLI binary and additional executables
Note: If you are on a system that has the 1.3 tanzu
CLI binary installed, see the Install the 1.4 tanzu CLI binary and additional executables section of my earlier post, Upgrading from TKG 1.3 to 1.4 (including extensions) on vSphere, as there are a few extra steps needed when upgrading the tanzu
cli.
After downloading the updated tanzu
CLI binary (tar/gz format), you will need to extract and install it on your local system:
tar -xvf tanzu-cli-bundle-linux-amd64.tar
cli/
cli/core/
cli/core/v1.4.0/
cli/core/v1.4.0/tanzu-core-linux_amd64
cli/core/plugin.yaml
cli/login/
cli/login/v1.4.0/
cli/login/v1.4.0/tanzu-login-linux_amd64
cli/login/plugin.yaml
cli/cluster/
cli/cluster/v1.4.0/
cli/cluster/v1.4.0/tanzu-cluster-linux_amd64
cli/cluster/plugin.yaml
cli/package/
cli/package/v1.4.0/
cli/package/v1.4.0/tanzu-package-linux_amd64
cli/package/plugin.yaml
cli/management-cluster/
cli/management-cluster/v1.4.0/
cli/management-cluster/v1.4.0/tanzu-management-cluster-linux_amd64
cli/management-cluster/plugin.yaml
cli/pinniped-auth/
cli/pinniped-auth/v1.4.0/
cli/pinniped-auth/v1.4.0/tanzu-pinniped-auth-linux_amd64
cli/pinniped-auth/plugin.yaml
cli/kubernetes-release/
cli/kubernetes-release/v1.4.0/
cli/kubernetes-release/v1.4.0/tanzu-kubernetes-release-linux_amd64
cli/kubernetes-release/plugin.yaml
cli/manifest.yaml
cli/ytt-linux-amd64-v0.34.0+vmware.1.gz
cli/kapp-linux-amd64-v0.37.0+vmware.1.gz
cli/imgpkg-linux-amd64-v0.10.0+vmware.1.gz
cli/kbld-linux-amd64-v0.30.0+vmware.1.gz
cli/vendir-linux-amd64-v0.21.1+vmware.1.gz
cd cli/
ubuntu@cli-vm:~/cli$ ls
cluster kubernetes-release pinniped-auth
core login vendir-linux-amd64-v0.21.1+vmware.1.gz
imgpkg-linux-amd64-v0.10.0+vmware.1.gz management-cluster ytt-linux-amd64-v0.34.0+vmware.1.gz
kapp-linux-amd64-v0.37.0+vmware.1.gz manifest.yaml
kbld-linux-amd64-v0.30.0+vmware.1.gz package
sudo install core/v1.4.0/tanzu-core-linux_amd64 /usr/local/bin/tanzu
You can now check to see the tanzu
CLI binary version:
tanzu version
version: v1.4.0
buildDate: 2021-08-30
sha: c9929b8f
You can follow a similar process to gunzip
and install the other utilities located in the cli folder (imgpkg
, kapp
, kbld
, vendir
, ytt
):
for file in $(ls *.gz)
do
gunzip $file
sudo install ${file::-3} /usr/local/bin/$(echo $file | awk -F - '{print $1}')
done
Lastly, you can follow a similar process to install or update the kubectl
binary to the latest version:
gunzip kubectl-linux-v1.21.2+vmware.1.gz
sudo install kubectl-linux-v1.21.2+vmware.1 /usr/local/bin/kubectl
kubectl version --client
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2+vmware.1", GitCommit:"54e7e68e30dd3f9f7bb4f814c9d112f54f0fb273", GitTreeState:"clean", BuildDate:"2021-06-28T22:17:36Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
Install the tanzu CLI plugins
Note: This next command needs to be run from the cli
directory noted previously.
tanzu plugin install --local . all
tanzu plugin list
NAME LATEST VERSION DESCRIPTION REPOSITORY VERSION STATUS
alpha v1.4.0 Alpha CLI commands core not installed
cluster v1.4.0 Kubernetes cluster operations core v1.4.0 installed
kubernetes-release v1.4.0 Kubernetes release operations core v1.4.0 installed
login v1.4.0 Login to the platform core v1.4.0 installed
management-cluster v1.4.0 Kubernetes management cluster operations core v1.4.0 installed
package v1.4.0 Tanzu package management core v1.4.0 installed
pinniped-auth v1.4.0 Pinniped authentication operations (usually not directly invoked) core v1.4.0 installed
Create a management cluster via the UI
The process for building out the management cluster is also fairly similar but you’ll see that there are some new configuration options present.
tanzu management-cluster create --ui
Validating the pre-requisites...
Serving kickstart UI at http://127.0.0.1:8080
A browser should be launched automatically and you can choose your platform for deploying TKG (vSphere in this example).

Enter the information necessary to connect to your vCenter Server and press the Connect button.

You’ll see a message similar to the following…press Continue if it looks good.

And another message, this time wanting you to confirm that you’re deploying a TKG management cluster and not trying to stand up vSphere with Tanzu.

Select an appropriate Datacenter and paste in a public key that can be used to validate SSH clients to the Kubernetes nodes.

You have a lot of choices here but I’m going with a simple and small Development cluster. This will result in a single control plane node and a single worker node. I’m also disabling Machine Health Checks (MHC) in my lab as the nested environment has very poor performance and I don’t want nodes to get recreated unnecessarily.

You’ll see that there was an option for Control Plane Endpoint Provider with Kube-vip and NSX Advanced Load Balancer (NSX ALB) choices. Prior to version 1.4, Kube-VIP was the only option and there was no choice presented. You can now use NSX ALB for both workload load balancer services and Kubernetes control plane load balanced endpoint addresses. You can read a little more about this in my earlier post, Migrating a TKG cluster control-plane endpoint from kube-vip to NSX-ALB. I’m going to use NSX ALB in this lab but I also want to specify an IP address instead of just letting NSX ALB pull one out of it’s pool. To make this work, I’ll need to make a change in NSX ALB.
From the Infrastructure, Networks page in the NSX ALB UI, you’ll need to click the Edit icon for the network where you want your control plane endpoint IP address to live (K8s-Frontend in my case).

Click the Edit button next to the appropriate network (192.168.220.0/23 in this case).

Click the Add Static IP Address Pool button and enter the desired control plane endpoint IP address as a range (192.168.220.128-1921.68.220.128 in this example).

Click the Save button.

Click the Save button again. You should see a summary of the networks and can see that there is now an additional Static IP Pool configured on the desired network (K8s-Frontend has two now).

Back in the TKG installer UI, I can configure the desired IP address and move on to the next page.

On the VMware NSX Advanced Load Balancer page, you now have to supply the CA certificate used by NSX ALB prior to clicking the Verify Credentials button. This used to be supplied after verifying credentials.

Another change you’ll see is that you now get to choose different VIP networks for Management VIPs (control plane endpoints) and Workload VIPs (workload load balancer services). I’m using the same network for both. These are all dropdowns where you can select the items that have been configured in NSX ALB.

For comparison, the following is what this page looked like in TKG 1.3:

You can skip the Optional Metadata page unless you want to provide custom labels.

Choose an appropriate VM folder, Datastore and Cluster.

Choose an appropriate portgroup for your Kubernetes node VMs. You can leave the service and pod CIDRs as is or update them as needed. You can’t tell from this screenshot but my network name is K8s-Workload. If you are behind a proxy, toggle the Enable Proxy Settings switch and enter any appropriate proxy information.

Configuring Identity Management is the same as in TKG 1.3 and you can use what I’ve configured here as a primer for how to configure Active Directory integration but will need to customize each parameter for your deployment.

There is one welcome addition and that is the ability to test your LDAP configuration, via the Verify LDAP Configuration button

I’ve entered a user and group that were created in Active Directory and will be used later to access the cluster.

That was not exactly what I was hoping to see here. It looks like there is an issue with user search being hardcoded to cn and groupsearch being hardcoded to ou in this Verify LDAP Configuration utility. Since my users can be found via cn, user search succeeded but my groups cannot be found by ou, hence the failure here. Regardless of success or failure in this utility, it will not prevent you from proceeding with the installation, which I did.
If you properly uploaded an OVA and converted it to a template, you should be able to select it (or from one of several).

The ability to register your management cluster to Tanzu Mission Control (TMC) during installation was added in 1.3 but there is an issue with TMC not working well with TKG 1.4 right now. Do not attempt to register your management cluster with TMC until the TMC Release Notes mention support for TKG 1.4 management clusters.

Barring any internal policies prohibiting it, you should always participate in the Customer Experience Improvement Program.

You can click the Review Configuration button and take some time to look over how you’ll be deploying your management cluster to ensure it is what you want.









If everything looks good, click the Deploy Management Cluster button.

You might have noticed in the previous screenshot that there was a file referenced, /home/ubuntu/.tanzu/tkg/clusterconfigs/6ezabehzyd.yaml
, that we didn’t create. The installer actually took everything we entered in the UI and saved it. The really nice thing about this is that you can quickly create other management clusters (or recreate this one if you decide to destroy it) from this same file
AVI_CA_DATA_B64: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUZhekNDQTFPZ0F3SUJBZ0lRTWZaeTA4bXV2SVZLZFpWRHo3L3JZekFOQmdrcWhraUc5dzBCQVFzRkFEQkkKTVJVd0V3WUtDWkltaVpQeUxHUUJHUllGZEdGdWVuVXhGREFTQmdvSmtpYUprL0lzWkFFWkZnUmpiM0p3TVJrdwpGd1lEVlFRREV4QkRUMDVVVWs5TVEwVk9WRVZTTFVOQk1CNFhEVEl3TURneE9URTNNakEwTkZvWERUTXdNRGd4Ck9URTNNekF6TlZvd1NERVZNQk1HQ2dtU0pvbVQ4aXhrQVJrV0JYUmhibnAxTVJRd0VnWUtDWkltaVpQeUxHUUIKR1JZRVkyOXljREVaTUJjR0ExVUVBeE1RUTA5T1ZGSlBURU5GVGxSRlVpMURRVENDQWlJd0RRWUpLb1pJaHZjTgpBUUVCQlFBRGdnSVBBRENDQWdvQ2dnSUJBTEtJZFg3NjQzUHp2dFZYbHFOSXdEdU5xK3JoY0hGMGZqUjQxNGorCjFJR1FVdVhyeWtqaFNEdGhQUCs4QkdON21CZ0hUOEFqQVMxYjk1eGM4QjBTMkZobG4zQW9SRTl6MDNHdGZzQnUKRlNCUlVWd0FpZlg2b1h1OTdXemZmaHFQdHhaZkxKWGJoT29tamxrWDZpZmZBczJUT0xVeDJPajR3MnZ5Ymh6agpsY0E3MGFpKzBTbDZheFNvM2xNWjRLa3VaMldnZkVjYURqamozMy9wVjMvYm5GSys3eWRQdHRjMlRlazV4c0k4ClhOTWlySVZ4VWlVVDRZTHk0V0xpUzIwMEpVZmJwMVpuTXZuYlE4SnYxUW5abDlXN1dtQlBjZ3hSNEFBdWIwSzQKdlpMWHU2TVhpYm9UbHprTUIvWXRoQ2tUTmxKY0traEhmNjBZUi9UNlN4MVQybnVweUJhNGRlbzVVR1B6aFJpSgpwTjM3dXFxQWRLMXFNRHBDakFSalM2VTdMZjlKS2pmaXJpTHpMZXlBalA4a2FONFRkSFNaZDBwY1FvWlN4ZXhRCjluKzRFNE1RbTRFSjREclZaQ2lsc3lMMkJkRVRjSFhLUGM3cStEYjRYTTdqUEtORzVHUDFFTVY0WG9odjU4eVoKL3JSZm1LNjRnYXI4QU1uT0tUMkFQNjgxcWRaczdsbGpPTmNYVUFMemxYNVRxSWNoWVQwRFZRbUZMWW9NQmVaegowbDIxUWpiSzBZV25QemE2WWkvTjRtNnJGYkVCNFdYaXFoWVNreHpyTXZvY1ZVZ2Q0QUFQMXZmSE5uRkVzblVSCm5Tc2lnbEZIL3hseU8zY0JGcm1vWkF4YkEyMDkxWEhXaEI0YzBtUUVJM2hPcUFCOFVvRkdCclFwbVErTGVzb0MKMUxaOUFnTUJBQUdqVVRCUE1Bc0dBMVVkRHdRRUF3SUJoakFQQmdOVkhSTUJBZjhFQlRBREFRSC9NQjBHQTFVZApEZ1FXQkJURkF4U3ZZNjRRNWFkaG04SVllY0hCQVV1b2J6QVFCZ2tyQmdFRUFZSTNGUUVFQXdJQkFEQU5CZ2txCmhraUc5dzBCQVFzRkFBT0NBZ0VBamcvdjRtSVA3Z0JWQ3c0cGVtdEduM1BTdERoL2FCOXZiV3lqQXl4U05hYUgKSDBuSUQ1cTV3b3c5dWVCaURmalRQbmhiZjNQNzY4SEc4b0wvKzlDK1ZtLzBsaUZCZCswL0RhYXlLcEFORk1MQgpCVitzMmFkV1JoUXVjTFFmWFB3dW04UnliV3Y4MndrUmtXQ0NkT0JhQXZBTXVUZ2swOFN3Skl5UWZWZ3BrM25ZCjBPd2pGd1NBYWR2ZXZmK0xvRC85TDhSOU5FdC9uNFdKZStMdEVhbW85RVZiK2wrY1lxeXh5dWJBVlkwWTZCTTIKR1hxQWgzRkVXMmFRTXB3b3VoLzVTN3c1b1NNWU42bWlZMW9qa2k4Z1BtMCs0K0NJTFBXaC9mcjJxME8vYlB0YgpUcisrblBNbVo4b3Y5ZXBOR0l1cWh0azVqYTIvSnVZK1JXNDZJUmM4UXBGMUV5VWFlMDJFNlUyVmFjczdHZ2UyCkNlU0lOa29MRkZtaUtCZkluL0hBY2hsbWU5YUw2RGxKOXdBcmVCREgzRThrSDdnUkRXYlNLMi9RRDBIcWFjK0UKZ2VHSHdwZy84T3RCT0hVTW5NN2VMT1hCSkZjSm9zV2YwWG5FZ1M0dWJnYUhncURFdThwOFBFN3JwQ3h0VU51cgp0K3gyeE9OSS9yQldnZGJwNTFsUHI3bzgxOXpQSkN2WVpxMVBwMXN0OGZiM1JsVVNXdmJRTVBGdEdBeWFCeStHCjBSZ1o5V1B0eUVZZ25IQWI1L0RxNDZzbmU5L1FuUHd3R3BqdjFzMW9FM1pGUWpodm5HaXM4K2RxUnhrM1laQWsKeWlEZ2hXN2FudHpZTDlTMUNDOHNWZ1ZPd0ZKd2ZGWHBkaWlyMzVtUWx5U0czMDFWNEZzUlYrWjBjRnA0TmkwPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0t
AVI_CLOUD_NAME: Default-Cloud
AVI_CONTROL_PLANE_HA_PROVIDER: "true"
AVI_CONTROLLER: nsxalb-cluster.corp.tanzu
AVI_DATA_NETWORK: K8s-Frontend
AVI_DATA_NETWORK_CIDR: 192.168.220.0/23
AVI_ENABLE: "true"
AVI_LABELS: ""
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_CIDR: 192.168.220.0/23
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME: K8s-Frontend
AVI_PASSWORD:
AVI_SERVICE_ENGINE_GROUP: Default-Group
AVI_USERNAME: admin
CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: tkg-mgmt
CLUSTER_PLAN: dev
ENABLE_AUDIT_LOGGING: "false"
ENABLE_CEIP_PARTICIPATION: "true"
ENABLE_MHC: "false"
IDENTITY_MANAGEMENT_TYPE: ldap
INFRASTRUCTURE_PROVIDER: vsphere
LDAP_BIND_DN: cn=Administrator,cn=Users,dc=corp,dc=tanzu
LDAP_BIND_PASSWORD:
LDAP_GROUP_SEARCH_BASE_DN: cn=Users,dc=corp,dc=tanzu
LDAP_GROUP_SEARCH_FILTER: (objectClass=group)
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: member
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: controlcenter.corp.tanzu:636
LDAP_ROOT_CA_DATA_B64: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUZhekNDQTFPZ0F3SUJBZ0lRTWZaeTA4bXV2SVZLZFpWRHo3L3JZekFOQmdrcWhraUc5dzBCQVFzRkFEQkkKTVJVd0V3WUtDWkltaVpQeUxHUUJHUllGZEdGdWVuVXhGREFTQmdvSmtpYUprL0lzWkFFWkZnUmpiM0p3TVJrdwpGd1lEVlFRREV4QkRUMDVVVWs5TVEwVk9WRVZTTFVOQk1CNFhEVEl3TURneE9URTNNakEwTkZvWERUTXdNRGd4Ck9URTNNekF6TlZvd1NERVZNQk1HQ2dtU0pvbVQ4aXhrQVJrV0JYUmhibnAxTVJRd0VnWUtDWkltaVpQeUxHUUIKR1JZRVkyOXljREVaTUJjR0ExVUVBeE1RUTA5T1ZGSlBURU5GVGxSRlVpMURRVENDQWlJd0RRWUpLb1pJaHZjTgpBUUVCQlFBRGdnSVBBRENDQWdvQ2dnSUJBTEtJZFg3NjQzUHp2dFZYbHFOSXdEdU5xK3JoY0hGMGZqUjQxNGorCjFJR1FVdVhyeWtqaFNEdGhQUCs4QkdON21CZ0hUOEFqQVMxYjk1eGM4QjBTMkZobG4zQW9SRTl6MDNHdGZzQnUKRlNCUlVWd0FpZlg2b1h1OTdXemZmaHFQdHhaZkxKWGJoT29tamxrWDZpZmZBczJUT0xVeDJPajR3MnZ5Ymh6agpsY0E3MGFpKzBTbDZheFNvM2xNWjRLa3VaMldnZkVjYURqamozMy9wVjMvYm5GSys3eWRQdHRjMlRlazV4c0k4ClhOTWlySVZ4VWlVVDRZTHk0V0xpUzIwMEpVZmJwMVpuTXZuYlE4SnYxUW5abDlXN1dtQlBjZ3hSNEFBdWIwSzQKdlpMWHU2TVhpYm9UbHprTUIvWXRoQ2tUTmxKY0traEhmNjBZUi9UNlN4MVQybnVweUJhNGRlbzVVR1B6aFJpSgpwTjM3dXFxQWRLMXFNRHBDakFSalM2VTdMZjlKS2pmaXJpTHpMZXlBalA4a2FONFRkSFNaZDBwY1FvWlN4ZXhRCjluKzRFNE1RbTRFSjREclZaQ2lsc3lMMkJkRVRjSFhLUGM3cStEYjRYTTdqUEtORzVHUDFFTVY0WG9odjU4eVoKL3JSZm1LNjRnYXI4QU1uT0tUMkFQNjgxcWRaczdsbGpPTmNYVUFMemxYNVRxSWNoWVQwRFZRbUZMWW9NQmVaegowbDIxUWpiSzBZV25QemE2WWkvTjRtNnJGYkVCNFdYaXFoWVNreHpyTXZvY1ZVZ2Q0QUFQMXZmSE5uRkVzblVSCm5Tc2lnbEZIL3hseU8zY0JGcm1vWkF4YkEyMDkxWEhXaEI0YzBtUUVJM2hPcUFCOFVvRkdCclFwbVErTGVzb0MKMUxaOUFnTUJBQUdqVVRCUE1Bc0dBMVVkRHdRRUF3SUJoakFQQmdOVkhSTUJBZjhFQlRBREFRSC9NQjBHQTFVZApEZ1FXQkJURkF4U3ZZNjRRNWFkaG04SVllY0hCQVV1b2J6QVFCZ2tyQmdFRUFZSTNGUUVFQXdJQkFEQU5CZ2txCmhraUc5dzBCQVFzRkFBT0NBZ0VBamcvdjRtSVA3Z0JWQ3c0cGVtdEduM1BTdERoL2FCOXZiV3lqQXl4U05hYUgKSDBuSUQ1cTV3b3c5dWVCaURmalRQbmhiZjNQNzY4SEc4b0wvKzlDK1ZtLzBsaUZCZCswL0RhYXlLcEFORk1MQgpCVitzMmFkV1JoUXVjTFFmWFB3dW04UnliV3Y4MndrUmtXQ0NkT0JhQXZBTXVUZ2swOFN3Skl5UWZWZ3BrM25ZCjBPd2pGd1NBYWR2ZXZmK0xvRC85TDhSOU5FdC9uNFdKZStMdEVhbW85RVZiK2wrY1lxeXh5dWJBVlkwWTZCTTIKR1hxQWgzRkVXMmFRTXB3b3VoLzVTN3c1b1NNWU42bWlZMW9qa2k4Z1BtMCs0K0NJTFBXaC9mcjJxME8vYlB0YgpUcisrblBNbVo4b3Y5ZXBOR0l1cWh0azVqYTIvSnVZK1JXNDZJUmM4UXBGMUV5VWFlMDJFNlUyVmFjczdHZ2UyCkNlU0lOa29MRkZtaUtCZkluL0hBY2hsbWU5YUw2RGxKOXdBcmVCREgzRThrSDdnUkRXYlNLMi9RRDBIcWFjK0UKZ2VHSHdwZy84T3RCT0hVTW5NN2VMT1hCSkZjSm9zV2YwWG5FZ1M0dWJnYUhncURFdThwOFBFN3JwQ3h0VU51cgp0K3gyeE9OSS9yQldnZGJwNTFsUHI3bzgxOXpQSkN2WVpxMVBwMXN0OGZiM1JsVVNXdmJRTVBGdEdBeWFCeStHCjBSZ1o5V1B0eUVZZ25IQWI1L0RxNDZzbmU5L1FuUHd3R3BqdjFzMW9FM1pGUWpodm5HaXM4K2RxUnhrM1laQWsKeWlEZ2hXN2FudHpZTDlTMUNDOHNWZ1ZPd0ZKd2ZGWHBkaWlyMzVtUWx5U0czMDFWNEZzUlYrWjBjRnA0TmkwPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0t
LDAP_USER_SEARCH_BASE_DN: cn=Users,dc=corp,dc=tanzu
LDAP_USER_SEARCH_FILTER: (objectClass=Person)
LDAP_USER_SEARCH_NAME_ATTRIBUTE: userPrincipalName
LDAP_USER_SEARCH_USERNAME: userPrincipalName
OIDC_IDENTITY_PROVIDER_CLIENT_ID: ""
OIDC_IDENTITY_PROVIDER_CLIENT_SECRET: ""
OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM: ""
OIDC_IDENTITY_PROVIDER_ISSUER_URL: ""
OIDC_IDENTITY_PROVIDER_NAME: ""
OIDC_IDENTITY_PROVIDER_SCOPES: ""
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM: ""
OS_ARCH: amd64
OS_NAME: ubuntu
OS_VERSION: "20.04"
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: "false"
VSPHERE_CONTROL_PLANE_DISK_GIB: "20"
VSPHERE_CONTROL_PLANE_ENDPOINT: 192.168.220.128
VSPHERE_CONTROL_PLANE_MEM_MIB: "4096"
VSPHERE_CONTROL_PLANE_NUM_CPUS: "2"
VSPHERE_DATACENTER: /RegionA01
VSPHERE_DATASTORE: /RegionA01/datastore/map-vol
VSPHERE_FOLDER: /RegionA01/vm
VSPHERE_NETWORK: /RegionA01/network/K8s-Workload
VSPHERE_PASSWORD:
VSPHERE_RESOURCE_POOL: /RegionA01/host/RegionA01-MGMT/Resources
VSPHERE_SERVER: vcsa-01a.corp.tanzu
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC5KYNeWQgVHrDHaEhBCLF1vIR0OAtUIJwjKYkY4E/5HhEu8fPFvBOIHPFTPrtkX4vzSiMFKE5WheKGQIpW3HHlRbmRPc9oe6nNKlsUfFAaJ7OKF146Gjpb7lWs/C34mjdtxSb1D/YcHSyqK5mxhyHAXPeK8lrxG5MLOJ3X2A3iUvXcBo1NdhRdLRWQmyjs16fnPx6840x9n5NqeiukFYIVhDMFErq42AkeewsWcbZQuwViSLk2cIc09eykAjaXMojCmSbjrj0kC3sbYX+HD2OWbKohTqqO6/UABtjYgTjIS4PqsXWk63dFdcxF6ukuO6ZHaiY7h3xX2rTg9pv1oT8WBR44TYgvyRp0Bhe0u2/n/PUTRfp22cOWTA2wG955g7jOd7RVGhtMHi9gFXeUS2KodO6C4XEXC7Y2qp9p9ARlNvu11QoaDyH3l0h57Me9we+3XQNuteV69TYrJnlgWecMa/x+rcaEkgr7LD61dY9sTuufttLBP2ro4EIWoBY6F1Ozvcp8lcgi/55uUGxwiKDA6gQ+UA/xtrKk60s6MvYMzOxJiUQbWYr3MJ3NSz6PJVXMvlsAac6U+vX4U9eJP6/C1YDyBaiT96cb/B9TkvpLrhPwqMZdYVomVHsdY7YriJB93MRinKaDJor1aIE/HMsMpbgFCNA7mma9x5HS/57Imw== admin@corp.local
VSPHERE_TLS_THUMBPRINT: 01:8D:8B:7F:13:3A:B9:C6:90:D2:5F:17:AD:EB:AC:78:26:3C:45:FB
VSPHERE_USERNAME: administrator@vsphere.local
VSPHERE_WORKER_DISK_GIB: "20"
VSPHERE_WORKER_MEM_MIB: "4096"
VSPHERE_WORKER_NUM_CPUS: "2"
You’ll be able to follow the installation process at a high level in the UI. A welcome addition to the installer UI in 1.4 is that each step is clearly called out on the left side of the page:

What’s happening at this point is that the system where you launched the installer from has downloaded a kind image and it should now be running as a container.
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
76f29df864d7 projects.registry.vmware.com/tkg/kind/node:v1.21.2_vmware.1 "/usr/local/bin/entrâ¦" 20 seconds ago Up 7 seconds 127.0.0.1:37779->6443/tcp tkg-kind-c5i647s09c6oq8590i10-control-plane
We can get access to this cluster via a temporary kubeconfig file that is created under .kube-tkg/tmp
.
kubectl --kubeconfig=.kube-tkg/tmp/config_XXXYlv2Y get nodes
NAME STATUS ROLES AGE VERSION
tkg-kind-c5i647s09c6oq8590i10-control-plane Ready control-plane,master 39s v1.21.2+vmware.1-360497810732255795
You can see that it’s still coming up at this point.
kubectl --kubeconfig=.kube-tkg/tmp/config_XXXYlv2Y get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
capi-system capi-controller-manager-778bd4dfb9-rtf2g 0/2 ContainerCreating 0 0s
capi-webhook-system capi-controller-manager-9995bdc94-tbtp9 0/2 ContainerCreating 0 3s
cert-manager cert-manager-77f6fb8fd5-t9g6x 1/1 Running 0 28s
cert-manager cert-manager-cainjector-6bd4cff7bb-62k2s 1/1 Running 0 28s
cert-manager cert-manager-webhook-fbfcb9d6c-bfpwz 1/1 Running 0 27s
kube-system coredns-8dcb5c56b-5xqht 1/1 Running 0 51s
kube-system coredns-8dcb5c56b-kzvdg 1/1 Running 0 51s
kube-system etcd-tkg-kind-c5i647s09c6oq8590i10-control-plane 1/1 Running 0 66s
kube-system kindnet-fk6lg 1/1 Running 0 51s
kube-system kube-apiserver-tkg-kind-c5i647s09c6oq8590i10-control-plane 1/1 Running 0 57s
kube-system kube-controller-manager-tkg-kind-c5i647s09c6oq8590i10-control-plane 1/1 Running 0 57s
kube-system kube-proxy-k6m7h 1/1 Running 0 51s
kube-system kube-scheduler-tkg-kind-c5i647s09c6oq8590i10-control-plane 1/1 Running 0 57s
local-path-storage local-path-provisioner-8b46957d4-s7v28 1/1 Running 0 51s
And once the pods in the bootstrap cluster are fully running, we can examine the logs in the capv-controller-manager pod to get a more detailed view of what’s happening. I like to stream these logs during installation to make sure nothing looks out of the ordinary.
kubectl --kubeconfig=.kube-tkg/tmp/config_XXXYlv2Y -n capv-system logs capv-controller-manager-587fbf697f-2fcg8 manager -f
I1011 16:14:20.351370 1 vspheremachine_controller.go:329] capv-controller-manager/vspheremachine-controller/tkg-system/tkg-mgmt-worker-x4sh5 "msg"="Waiting for the control plane to be initialized"
I1011 16:14:20.356408 1 controller.go:281] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="vspheremachine" "name"="tkg-mgmt-worker-x4sh5" "namespace"="tkg-system"
I1011 16:14:20.415855 1 vspheremachine_controller.go:329] capv-controller-manager/vspheremachine-controller/tkg-system/tkg-mgmt-worker-x4sh5 "msg"="Waiting for the control plane to be initialized"
I1011 16:14:20.416776 1 controller.go:281] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="vspheremachine" "name"="tkg-mgmt-worker-x4sh5" "namespace"="tkg-system"
I1011 16:14:20.665596 1 vspheremachine_controller.go:329] capv-controller-manager/vspheremachine-controller/tkg-system/tkg-mgmt-worker-x4sh5 "msg"="Waiting for the control plane to be initialized"
I1011 16:14:20.667597 1 controller.go:281] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="vspheremachine" "name"="tkg-mgmt-worker-x4sh5" "namespace"="tkg-system"
I1011 16:14:24.178944 1 vspherecluster_controller.go:285] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="Reconciling VSphereCluster"
I1011 16:14:24.321884 1 vspherecluster_controller.go:445] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="skipping load balancer reconciliation" "reason"="VSphereCluster.Spec.LoadBalancerRef is nil"
I1011 16:14:24.321926 1 vspherecluster_controller.go:644] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="skipping reconcile when API server is online" "reason"="alreadyPolling"
I1011 16:14:24.321982 1 vspherecluster_controller.go:334] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="control plane endpoint is not reconciled"
I1011 16:14:24.323216 1 controller.go:281] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="vspherecluster" "name"="tkg-mgmt" "namespace"="tkg-system"
I1011 16:14:27.515259 1 vspherecluster_controller.go:285] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="Reconciling VSphereCluster"
I1011 16:14:27.545140 1 vspherecluster_controller.go:445] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="skipping load balancer reconciliation" "reason"="VSphereCluster.Spec.LoadBalancerRef is nil"
I1011 16:14:27.548850 1 vspherecluster_controller.go:644] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="skipping reconcile when API server is online" "reason"="alreadyPolling"
I1011 16:14:27.548992 1 vspherecluster_controller.go:334] capv-controller-manager/vspherecluster-controller/tkg-system/tkg-mgmt "msg"="control plane endpoint is not reconciled"
I1011 16:14:27.550613 1 controller.go:281] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="vspherecluster" "name"="tkg-mgmt" "namespace"="tkg-system"
Back in the UI we can see that the process has moved on to actually creating the management cluster.

In the vSphere UI, you should see the Ubuntu template getting cloned:

And a VM getting created whose name starts with the management cluster name (tkg-mgmt in this example):

In the NSX ALB UI, there should be a virtual service configured that will be the load balanced endpoint for the control plane.

Unless you already had any service engines deployed, the way this virtual service looks is normal as they are not running yet. Also, with the control plane VM not powered on, it does not have an IP address so the virtual service is not even fully configured.
You can navigate to the Infrastructure page to see that two service engines are being provisioned.

You’ll also see these being provisioned in the vSphere Client at this point.

After a short time, the first control plane VM should be powered on and have an IP address (192.168.130.155 in this example).

The virtual service in NSX ALB is a little bit further along as well. Both service engines are powered on, an IP has been assigned and we can see that the service will point to the first control plane node on port 6443:


Just a few minutes later, after kubeadm has run in the control plane node and the requisite Kubernetes processes are available, the service is in a running state:

The installer UI shows that the process has moved on to the next step.

We can see in the vSphere UI that a worker node has been configured and is powered on (with an IP address of 192.168.130.157 in this example).

You should now have a valid kubeconfig file at ~/.kube/config
and can start to inspect the cluster.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
tkg-mgmt-control-plane-8qjmd Ready control-plane,master 10m v1.21.2+vmware.1
tkg-mgmt-md-0-559c48d65d-d7wrr Ready <none> 40s v1.21.2+vmware.1
kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-77f6fb8fd5-g744d 0/1 Pending 0 12m
cert-manager cert-manager-cainjector-6bd4cff7bb-96j9r 0/1 Pending 0 12m
cert-manager cert-manager-webhook-fbfcb9d6c-qrczs 0/1 Pending 0 12m
kube-system antrea-agent-9km2d 2/2 Running 0 4m32s
kube-system antrea-agent-hvwb7 2/2 Running 6 14m
kube-system antrea-controller-86f8988c5f-6wxxl 0/1 Running 0 16m
kube-system coredns-8dcb5c56b-mhpn4 0/1 Pending 0 14m
kube-system coredns-8dcb5c56b-qmrng 0/1 Pending 0 14m
kube-system etcd-tkg-mgmt-control-plane-8qjmd 1/1 Running 0 14m
kube-system kube-apiserver-tkg-mgmt-control-plane-8qjmd 1/1 Running 0 14m
kube-system kube-controller-manager-tkg-mgmt-control-plane-8qjmd 1/1 Running 0 14m
kube-system kube-proxy-rhbxq 1/1 Running 0 14m
kube-system kube-proxy-t2j6x 1/1 Running 0 4m32s
kube-system kube-scheduler-tkg-mgmt-control-plane-8qjmd 1/1 Running 0 14m
kube-system vsphere-cloud-controller-manager-sps2q 1/1 Running 7 14m
tkg-system kapp-controller-6499b8866-m5g62 1/1 Running 6 16m
tkg-system tanzu-addons-controller-manager-7b4c4b6957-l5mrr 0/1 ContainerCreating 0 73s
tkg-system tanzu-capabilities-controller-manager-6ff97656b8-r2wzn 0/1 ContainerCreating 0 17m
tkr-system tkr-controller-manager-6bc455b5d4-hrlnq 0/1 Pending 0 17m
The system is still coming up but after just a few more minutes, the installer UI will show that the process is completed.

From the command line, we can start to inspect our new management cluster
tanzu management-cluster get
NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES ROLES
tkg-mgmt tkg-system running 1/1 1/1 v1.21.2+vmware.1 management
Details:
NAME READY SEVERITY REASON SINCE MESSAGE
/tkg-mgmt True 9m23s
ââClusterInfrastructure - VSphereCluster/tkg-mgmt True 9m34s
ââControlPlane - KubeadmControlPlane/tkg-mgmt-control-plane True 9m23s
â ââMachine/tkg-mgmt-control-plane-8qjmd True 9m26s
ââWorkers
ââMachineDeployment/tkg-mgmt-md-0
ââMachine/tkg-mgmt-md-0-559c48d65d-d7wrr True 9m32s
Providers:
NAMESPACE NAME TYPE PROVIDERNAME VERSION WATCHNAMESPACE
capi-kubeadm-bootstrap-system bootstrap-kubeadm BootstrapProvider kubeadm v0.3.23
capi-kubeadm-control-plane-system control-plane-kubeadm ControlPlaneProvider kubeadm v0.3.23
capi-system cluster-api CoreProvider cluster-api v0.3.23
capv-system infrastructure-vsphere InfrastructureProvider vsphere v0.7.10
In my previous post, Upgrading from TKG 1.3 to 1.4 (including extensions) on vSphere, I discussed the new package framework and you can see here as well that much of the core functionality is moved into packages. This will make the lifecycle management of these pieces much easier going forward.
tanzu package installed list -A
- Retrieving installed packages...
NAME PACKAGE-NAME PACKAGE-VERSION STATUS NAMESPACE
ako-operator ako-operator.tanzu.vmware.com Reconcile succeeded tkg-system
antrea antrea.tanzu.vmware.com Reconcile succeeded tkg-system
load-balancer-and-ingress-service load-balancer-and-ingress-service.tanzu.vmware.com Reconcile succeeded tkg-system
metrics-server metrics-server.tanzu.vmware.com Reconcile succeeded tkg-system
pinniped pinniped.tanzu.vmware.com Reconcile succeeded tkg-system
tanzu-addons-manager addons-manager.tanzu.vmware.com Reconcile succeeded tkg-system
vsphere-cpi vsphere-cpi.tanzu.vmware.com Reconcile succeeded tkg-system
vsphere-csi vsphere-csi.tanzu.vmware.com Reconcile succeeded tkg-system
kubectl get packageinstalls -A
NAMESPACE NAME PACKAGE NAME PACKAGE VERSION DESCRIPTION AGE
tkg-system ako-operator ako-operator.tanzu.vmware.com 1.4.0+vmware.1-tkg.1 Reconcile succeeded 24h
tkg-system antrea antrea.tanzu.vmware.com 0.13.3+vmware.1-tkg.1 Reconcile succeeded 24h
tkg-system load-balancer-and-ingress-service load-balancer-and-ingress-service.tanzu.vmware.com 1.4.3+vmware.1-tkg.1 Reconcile succeeded 24h
tkg-system metrics-server metrics-server.tanzu.vmware.com 0.4.0+vmware.1-tkg.1 Reconcile succeeded 24h
tkg-system pinniped pinniped.tanzu.vmware.com 0.4.4+vmware.1-tkg.1 Reconcile succeeded 24h
tkg-system tanzu-addons-manager addons-manager.tanzu.vmware.com 1.4.0+vmware.1-tkg.1 Reconcile succeeded 24h
tkg-system vsphere-cpi vsphere-cpi.tanzu.vmware.com 1.21.0+vmware.1-tkg.1 Reconcile succeeded 24h
tkg-system vsphere-csi vsphere-csi.tanzu.vmware.com 2.3.0+vmware.1-tkg.2 Reconcile succeeded 24h
Get a kubeconfig file for an LDAP user.
You can see that we have a new kubectl context created and set as current.
kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* tkg-mgmt-admin@tkg-mgmt tkg-mgmt tkg-mgmt-admin
That context is what is known as an “admin” context and does not use LDAP for authentication. In order to provide a kubeconfig file to a an LDAP user, we run the following:
tanzu management-cluster kubeconfig get --export-file /tmp/ldaps-tkg-mgmt-kubeconfig
You can now access the cluster by specifying '--kubeconfig /tmp/ldaps-tkg-mgmt-kubeconfig' flag when using `kubectl` command
From here, we can deliver that kubeconfig file (/tmp/ldaps-tkg-mgmt-kubeconfig
) to any user and they can use it to work with the cluster.
kubectl --kubeconfig=/tmp/ldaps-tkg-mgmt-kubeconfig get nodes
Error: could not complete Pinniped login: could not perform OIDC discovery for "https://192.168.220.128:31234": Get "https://192.168.220.128:31234/.well-known/openid-configuration": dial tcp 192.168.220.128:31234: connect: connection refused
Error: pinniped-auth login failed: exit status 1
Error: exit status 1
Unable to connect to the server: getting credentials: exec: executable tanzu failed with exit code 1
Uh oh. That doesn’t look good. It turns out that this is a known issue when you’re using NSX ALB to provide your control plane endpoint addresses, as noted in Add a Load Balancer for an Identity Provider on vSphere. The problem is that pinniped is trying to use the cluster’s control plane endpoint address (192.168.220.128 in this example) but the load balancer service is only listening on ports 6443 and 443 (not 31234 as noted in the error). We’ll be creating two new load balancer services to fix this.
The first step is to create an overlay file that will modify the pinniped-supervisor and dex services to be of type LoadBalancer:
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind": "Service", "metadata": {"name": "pinniped-supervisor", "namespace": "pinniped-supervisor"}})
---
#@overlay/replace
spec:
type: LoadBalancer
selector:
app: pinniped-supervisor
ports:
- name: https
protocol: TCP
port: 443
targetPort: 8443
#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.subset({"kind": "Service", "metadata": {"name": "dexsvc", "namespace": "tanzu-system-auth"}}), missing_ok=True
---
#@overlay/replace
spec:
type: LoadBalancer
selector:
app: dex
ports:
- name: dex
protocol: TCP
port: 443
targetPort: https
You can see that the current services are of type NodePort:
kubectl get svc -A |egrep "pinniped-supervisor|dex"
pinniped-supervisor NodePort 100.69.34.18 <none> 443:31234/TCP 23m
dex NodePort 100.68.1.150 <none> 443:30133/TCP 23m
Next, the contents of the overlay file need to be base64 encoded for inclusion in a secret:
cat pinniped-supervisor-svc-overlay.yaml | base64 -w 0
I0AgbG9hZCgiQHl0dDpvdmVybGF5IiwgIm92ZXJsYXkiKQojQG92ZXJsYXkvbWF0Y2ggYnk9b3ZlcmxheS5zdWJzZXQoeyJraW5kIjogIlNlcnZpY2UiLCAibWV0YWRhdGEiOiB7Im5hbWUiOiAicGlubmlwZWQtc3VwZXJ2aXNvciIsICJuYW1lc3BhY2UiOiAicGlubmlwZWQtc3VwZXJ2aXNvciJ9fSkKLS0tCiNAb3ZlcmxheS9yZXBsYWNlCnNwZWM6CiAgdHlwZTogTG9hZEJhbGFuY2VyCiAgc2VsZWN0b3I6CiAgICBhcHA6IHBpbm5pcGVkLXN1cGVydmlzb3IKICBwb3J0czoKICAgIC0gbmFtZTogaHR0cHMKICAgICAgcHJvdG9jb2w6IFRDUAogICAgICBwb3J0OiA0NDMKICAgICAgdGFyZ2V0UG9ydDogODQ0MwoKI0AgbG9hZCgiQHl0dDpvdmVybGF5IiwgIm92ZXJsYXkiKQojQG92ZXJsYXkvbWF0Y2ggYnk9b3ZlcmxheS5zdWJzZXQoeyJraW5kIjogIlNlcnZpY2UiLCAibWV0YWRhdGEiOiB7Im5hbWUiOiAiZGV4c3ZjIiwgIm5hbWVzcGFjZSI6ICJ0YW56dS1zeXN0ZW0tYXV0aCJ9fSksIG1pc3Npbmdfb2s9VHJ1ZQotLS0KI0BvdmVybGF5L3JlcGxhY2UKc3BlYzoKICB0eXBlOiBMb2FkQmFsYW5jZXIKICBzZWxlY3RvcjoKICAgIGFwcDogZGV4CiAgcG9ydHM6CiAgICAtIG5hbWU6IGRleAogICAgICBwcm90b2NvbDogVENQCiAgICAgIHBvcnQ6IDQ0MwogICAgICB0YXJnZXRQb3J0OiBodHRwcwo=
Issue a command similar to the following to update the tkg-mgmt-pinniped-addon secret
kubectl -n tkg-system patch secret tkg-mgmt-pinniped-addon -p '{"data": {"overlays.yaml": "I0AgbG9hZCgiQHl0dDpvdmVybGF5IiwgIm92ZXJsYXkiKQojQG92ZXJsYXkvbWF0Y2ggYnk9b3ZlcmxheS5zdWJzZXQoeyJraW5kIjogIlNlcnZpY2UiLCAibWV0YWRhdGEiOiB7Im5hbWUiOiAicGlubmlwZWQtc3VwZXJ2aXNvciIsICJuYW1lc3BhY2UiOiAicGlubmlwZWQtc3VwZXJ2aXNvciJ9fSkKLS0tCiNAb3ZlcmxheS9yZXBsYWNlCnNwZWM6CiAgdHlwZTogTG9hZEJhbGFuY2VyCiAgc2VsZWN0b3I6CiAgICBhcHA6IHBpbm5pcGVkLXN1cGVydmlzb3IKICBwb3J0czoKICAgIC0gbmFtZTogaHR0cHMKICAgICAgcHJvdG9jb2w6IFRDUAogICAgICBwb3J0OiA0NDMKICAgICAgdGFyZ2V0UG9ydDogODQ0MwoKI0AgbG9hZCgiQHl0dDpvdmVybGF5IiwgIm92ZXJsYXkiKQojQG92ZXJsYXkvbWF0Y2ggYnk9b3ZlcmxheS5zdWJzZXQoeyJraW5kIjogIlNlcnZpY2UiLCAibWV0YWRhdGEiOiB7Im5hbWUiOiAiZGV4c3ZjIiwgIm5hbWVzcGFjZSI6ICJ0YW56dS1zeXN0ZW0tYXV0aCJ9fSksIG1pc3Npbmdfb2s9VHJ1ZQotLS0KI0BvdmVybGF5L3JlcGxhY2UKc3BlYzoKICB0eXBlOiBMb2FkQmFsYW5jZXIKICBzZWxlY3RvcjoKICAgIGFwcDogZGV4CiAgcG9ydHM6CiAgICAtIG5hbWU6IGRleAogICAgICBwcm90b2NvbDogVENQCiAgICAgIHBvcnQ6IDQ0MwogICAgICB0YXJnZXRQb3J0OiBodHRwcwo="}}'
secret/tkg-mgmt-pinniped-addon patched
Since pinniped (with dex included) is now a package, the reconciliation process should kick off within five minutes and you should see the service updated to type LoadBalancer:
kubectl get svc -A |egrep "pinniped-supervisor|dex"
pinniped-supervisor pinniped-supervisor LoadBalancer 100.69.34.18 192.168.220.4 443:31728/TCP 24m
tanzu-system-auth dexsvc LoadBalancer 100.68.1.150 192.168.220.3 443:30133/TCP 24m
In the NSX ALB UI, you will see two new services created


To complete the process, the pinniped-post-deploy-job job needs to be deleted so that it will automatically be recreated and re-run, finishing this process:
kubectl delete jobs pinniped-post-deploy-job -n pinniped-supervisor
job.batch "pinniped-post-deploy-job" deleted
kubectl get job pinniped-post-deploy-job -n pinniped-supervisor
Error from server (NotFound): jobs.batch "pinniped-post-deploy-job" not found
After a few minutes…
kubectl get job pinniped-post-deploy-job -n pinniped-supervisor
NAME COMPLETIONS DURATION AGE
pinniped-post-deploy-job 1/1 11s 39s
And you can now try running that same kubectl
command using the LDAP user’s kubeconfig file:
kubectl --kubeconfig=/tmp/ldaps-tkg-mgmt-kubeconfig get nodes
Error: could not complete Pinniped login: could not perform OIDC discovery for "https://192.168.220.128:31234": Get "https://192.168.220.128:31234/.well-known/openid-configuration": dial tcp 192.168.220.128:31234: connect: connection refused
Error: pinniped-auth login failed: exit status 1
Error: exit status 1
Unable to connect to the server: getting credentials: exec: executable tanzu failed with exit code 1
Uh oh again! This was easy to resolve though as the problem was that I needed to redownload that kubeconfig file after making the updates to pinniped:
tanzu management-cluster kubeconfig get --export-file /tmp/ldaps-tkg-mgmt-kubeconfig
You can now access the cluster by specifying '--kubeconfig /tmp/ldaps-tkg-mgmt-kubeconfig' flag when using `kubectl` command
Hopefully, only one more time…
kubectl --kubeconfig=/tmp/ldaps-tkg-mgmt-kubeconfig get nodes
Success!

Click Advanced and then the Proceed to 192.168.220.4 (unsafe) link

Click Advanced and then the Proceed to 192.168.220.3 (unsafe) link. If you were paying close attention to the last two pages, you’ll note that the first (192.168.220.4) was the load balancer address for the pinniped-supervisor service and the second (1921.68.220.3) was the load balancer address for the dex service. Dex is only called when you’re performing LDAP authentication…if this were OIDC, only one page would have been presented.
You can enter your LDAP credentials here (I’m using the tkgadmin account I mentioned when testing the LDAP configuration).

And if it succeeds, you’ll see a message similar to the following:

kubectl --kubeconfig=/tmp/ldaps-tkg-mgmt-kubeconfig get nodes
Error from server (Forbidden): nodes is forbidden: User "tkgadmin@corp.tanzu" cannot list resource "nodes" in API group "" at the cluster scope
Still having problems but this one is familiar as I saw it in TKG 1.3 (and 1.2) but forgot about it (again). Since the tanzuadmins group has no privileges in the cluster, we need to create a clusterrolebinding.
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: tanzuadmins
subjects:
- kind: Group
name: tanzuadmins
apiGroup:
roleRef:
kind: ClusterRole #this must be Role or ClusterRole
name: cluster-admin # this must match the name of the Role or ClusterRole you wish to bind to
apiGroup: rbac.authorization.k8s.io
kubectl apply -f tanzuadmin-clusterrolebinding.yaml
clusterrolebinding.rbac.authorization.k8s.io/tanzuadmins created
Now the user is able to login and work with cluster-admin privileges (or lower if you use a different clusterrolebinding)
kubectl --kubeconfig=/tmp/ldaps-tkg-mgmt-kubeconfig get nodes
NAME STATUS ROLES AGE VERSION
tkg-mgmt-control-plane-8qjmd Ready control-plane,master 64m v1.21.2+vmware.1
tkg-mgmt-md-0-559c48d65d-d7wrr Ready <none> 54m v1.21.2+vmware.1
Be sure to check out my post, Using multiple availability zones for a workload cluster in TKG 1.4 on vSphere, which builds upon this recently deployed management cluster.
I’m pretty new to this whole NSXALB/TKG thing and I’ve found this post to be very helpful. When deploying TKG via the UI it failed, but a subsequent run using the command published from the UI, it appears to have worked. I’m having an issue trying to path the pinniped stuff for LDAP. When running the command:
kubectl -n tkg-system patch secret tkg-mgmt-pinniped-addon -p ‘{“data”: {“overlays.yaml”: “I0AgbG9hZRwcwo=”}}’
I’m getting:
Error from server (NotFound): secrets “tkg-mgmt-pinniped-addon” not found.
Hello David. I can think of a few things that might be going on here…
1. If your context is not set to the management cluster the noted secret will not exist.
2. If you named your management cluster something other than “tkg-mgmt” you will need to update the name of the secret you’re patching accordingly (also be sure to update the base64-encoded overlay value).
3. The secret doesn’t exist in the management cluster since LDAP/OIDC authentication was skipped during deployment. You’ll need to redeploy the management cluster and configure LDAP/OIDC authentication.
4. LDAP/OIDC authentication was configured during deployment but something went wrong and it’s not correctly configured. In this scenario, you could examine the logs and events to see what might have went wrong but you’re likely looking at a redeployment. You should also open a support request with VMware to get more eyes on this.
I haven’t been working with this again until today. I got passed the issue with pinniped (I needed a value in the LDAP_USER_SEARCH_USERNAME field in the yaml I was using). Once passed that I was able to patch correctly and deleted the job and it was created, but after re-exporting the kubeconfig, and trying to login (it took the credentials), but I got this in the browser when trying to authenticate:
Unprocessable Entity: no username claim in upstream ID token
Is there a way to pass the credentials in the environment?
I did some quick searching and found an internal bug that might match the behavior you’re seeing. If this is your issue, there does not appear to be a resolution for it just yet (or even a workaround noted). From what I can see, if the LDAP/OIDC parameters are not correct at the time of cluster creation, there may be roadblocks to getting them changed in such a way that the cluster will use the new values properly. With that in mind, you might try a new deployment with the credentials configured the same as in your patch to see if this produces the same error or not. I can also keep an eye on the bug and if there are any updates (fix, workaround or other) I can post some next steps here.
I think the LDAP paremeters are correct in the YAML, as I looked at the logs from the dex container and see this which seems to indicate the login to AD did work:
{“level”:”info”,”msg”:”login successful: connector \”ldap\”, username=\”\”, preferred_username=\”\”, email=\”CN=David Quattlebaum,OU=User Accounts,OU=XXXXXXX\”, groups=[]”,”time”:”2022-01-10T15:42:17Z”}
But still got the error. This happens whether using Firefox or Chrome. I’m at a stand-still at this point and thank you for your replies. I’ll continue to investigate as best I can.
Hi Chris,
I wanted to move to v1.4 from 1.3.1 and use the AVI LB. The install is not that different from 1.3.1, however I keep coming across the same issue:
tanzu-addons-controller-manager is stuck in CrashLoopBackoff, and the same behavior can be replicated across different installs
Events: │
│ Type Reason Age From Message │
│ —- —— —- —- ——- │
│ Normal Scheduled 7m53s default-scheduler Successfully assigned tkg-system/tanzu-addons-controller-manager-6fdc9c558d-r7tg9 to mgmt-cluster- │
│ control-plane-9mjff │
│ Normal Pulling 7m50s kubelet Pulling image “projects.registry.vmware.com/tkg/tanzu_core/addons/tanzu-addons-manager@sha256:284f │
│ c8048f9760ce35ed6267cfeaedada84c2e90b016ebc52e73ba8197b9ca2c” │
│ Normal Pulled 7m47s kubelet Successfully pulled image “projects.registry.vmware.com/tkg/tanzu_core/addons/tanzu-addons-manager │
│ @sha256:284fc8048f9760ce35ed6267cfeaedada84c2e90b016ebc52e73ba8197b9ca2c” in 3.457878476s │
│ Normal Created 6m43s (x4 over 7m47s) kubelet Created container tanzu-addons-controller │
│ Normal Pulled 6m43s (x3 over 7m41s) kubelet Container image “projects.registry.vmware.com/tkg/tanzu_core/addons/tanzu-addons-manager@sha256:28 │
│ 4fc8048f9760ce35ed6267cfeaedada84c2e90b016ebc52e73ba8197b9ca2c” already present on machine │
│ Normal Started 6m42s (x4 over 7m46s) kubelet Started container tanzu-addons-controller │
│ Warning BackOff 2m45s (x30 over 7m35s) kubelet Back-off restarting failed container
I1220 16:38:26.536647 1 request.go:645] Throttling request took 1.035299858s, request: GET:https://100.64.0.1:443/apis/coordination.k8s.io/v1beta1?timeout=32 │
│ s │
│ I1220 16:38:27.246627 1 main.go:134] setup “msg”=”starting manager” │
│ I1220 16:38:27.247117 1 controller.go:158] controller-runtime/manager/controller/cluster “msg”=”Starting EventSource” “reconciler group”=”cluster.x-k8s.io” ” │
│ reconciler kind”=”Cluster” “source”={“Type”:{“metadata”:{“creationTimestamp”:null},”spec”:{“controlPlaneEndpoint”:{“host”:””,”port”:0}},”status”:{“infrastructureRe │
│ ady”:false,”controlPlaneInitialized”:false}}} │
│ E1220 16:38:30.488283 1 source.go:117] controller-runtime/source “msg”=”if kind is a CRD, it should be installed before calling Start” “error”=”no matches fo │
│ r kind \”Cluster\” in version \”cluster.x-k8s.io/v1alpha3\”” “kind”={“Group”:”cluster.x-k8s.io”,”Kind”:”Cluster”} │
│ E1220 16:38:30.488458 1 main.go:136] setup “msg”=”problem running manager” “error”=”no matches for kind \”Cluster\” in version \”cluster.x-k8s.io/v1alpha3\”” │
│ │
│ E1220 16:38:30.488474 1 internal.go:547] controller-runtime/manager “msg”=”error received after stop sequence was engaged” “error”=”context canceled”
I am lost on what might be the issue here.
This is not something I’ve seen before but one of the errors is odd…no matches for kind \”Cluster\”. This makes me think that something is running against a workload cluster (where there is no expectation of the ClusterAPI CRDs to be created) instead of against the management cluster. Is it possible that the wrong context was in play during one of the upgrade steps? If nothing easy pans out I would definitely recommend opening a support request with VMware.
Thanks for coming back to me Chris, I am using K9s to monitor the local kind cluster and the tkg management cluster while it’s being created, you think that might confuse the installation?
I like k9s for the visibility it provides, maybe I should purge the install and the local kind cluster and try going for a blind install…?
thanks
Mel
this seems random, using the same yaml file, the pod is now running
Events: │
│ Type Reason Age From Message │
│ —- —— —- —- ——- │
│ Normal Scheduled 6m57s default-scheduler Successfully assigned tkg-system/tanzu-addons-controller-manager-7bf55896-2fgsn to mgmt-cluster-md-0-6f496574bd-hzpx5 │
│ Normal Pulling 6m55s kubelet Pulling image “projects.registry.vmware.com/tkg/tanzu_core/addons/tanzu-addons-manager@sha256:284fc8048f9760ce35ed6267 │
│ cfeaedada84c2e90b016ebc52e73ba8197b9ca2c” │
│ Normal Pulled 6m51s kubelet Successfully pulled image “projects.registry.vmware.com/tkg/tanzu_core/addons/tanzu-addons-manager@sha256:284fc8048f97 │
│ 60ce35ed6267cfeaedada84c2e90b016ebc52e73ba8197b9ca2c” in 3.427730447s │
│ Normal Created 5m48s (x4 over 6m51s) kubelet Created container tanzu-addons-controller │
│ Normal Pulled 5m48s (x3 over 6m46s) kubelet Container image “projects.registry.vmware.com/tkg/tanzu_core/addons/tanzu-addons-manager@sha256:284fc8048f9760ce35ed62 │
│ 67cfeaedada84c2e90b016ebc52e73ba8197b9ca2c” already present on machine │
│ Normal Started 5m47s (x4 over 6m51s) kubelet Started container tanzu-addons-controller │
│ Warning BackOff 5m12s (x12 over 6m41s) kubelet Back-off restarting failed container
I1220 20:01:46.554674 1 request.go:645] Throttling request took 1.03854514s, request: GET:https://100.64.0.1:443/apis/infrastructure.cluster.x-k8s.io/v1alpha3?timeout=32s │
│ I1220 20:01:47.714470 1 main.go:134] setup “msg”=”starting manager” │
│ I1220 20:01:47.715094 1 controller.go:158] controller-runtime/manager/controller/cluster “msg”=”Starting EventSource” “reconciler group”=”cluster.x-k8s.io” “reconciler kind”=”Cl │
│ uster” “source”={“Type”:{“metadata”:{“creationTimestamp”:null},”spec”:{“controlPlaneEndpoint”:{“host”:””,”port”:0}},”status”:{“infrastructureReady”:false,”controlPlaneInitialized”:fal │
│ se}}} │
│ I1220 20:01:47.815573 1 controller.go:158] controller-runtime/manager/controller/cluster “msg”=”Starting EventSource” “reconciler group”=”cluster.x-k8s.io” “reconciler kind”=”Cl │
│ uster” “source”={“Type”:{“metadata”:{“creationTimestamp”:null}}} │
│ I1220 20:01:48.003893 1 controller.go:158] controller-runtime/manager/controller/cluster “msg”=”Starting EventSource” “reconciler group”=”cluster.x-k8s.io” “reconciler kind”=”Cl │
│ uster” “source”={“Type”:{“metadata”:{“creationTimestamp”:null},”spec”:{“version”:””},”status”:{}}} │
│ I1220 20:01:48.105141 1 controller.go:158] controller-runtime/manager/controller/cluster “msg”=”Starting EventSource” “reconciler group”=”cluster.x-k8s.io” “reconciler kind”=”Cl │
│ uster” “source”={“Type”:{“metadata”:{“creationTimestamp”:null}}} │
│ I1220 20:01:48.207077 1 controller.go:158] controller-runtime/manager/controller/cluster “msg”=”Starting EventSource” “reconciler group”=”cluster.x-k8s.io” “reconciler kind”=”Cl │
│ uster” “source”={“Type”:{“metadata”:{“creationTimestamp”:null},”spec”:{“version”:””,”infrastructureTemplate”:{},”kubeadmConfigSpec”:{}},”status”:{“initialized”:false,”ready”:false}}} │
│ I1220 20:01:48.308136 1 controller.go:165] controller-runtime/manager/controller/cluster “msg”=”Starting Controller” “reconciler group”=”cluster.x-k8s.io” “reconciler kind”=”Clu │
│ ster” │
│ I1220 20:01:48.308195 1 controller.go:192] controller-runtime/manager/controller/cluster “msg”=”Starting workers” “reconciler group”=”cluster.x-k8s.io” “reconciler kind”=”Cluste │
│ r” “worker count”=10 │
│ I1220 20:01:48.308294 1 addon_controller.go:111] controllers/Addon “msg”=”Reconciling cluster” “cluster-name”=”mgmt-cluster” “cluster-ns”=”tkg-system” │
│ I1220 20:01:48.308486 1 addon_controller.go:316] controllers/Addon “msg”=”Bom not found” “cluster-name”=”mgmt-cluster” “cluster-ns”=”tkg-system”
I am using k9s to monitor both the local kind cluster and the tkg management cluster, do you think that might cause some issues while the install is running…?
That does seem odd and I have no experience with k9s so I can’t really comment on if it’s part of the problem or not. It might be worth trying the install without k9s in the mix though to rule it out.
k9s has nothing to do with the issue I am seeing, so does the internet access, I have moved the connection from PIA to direct WAN, changed from my caching dns to quad9, instead of k9s I just used kubectl and pointed to the config file for both the kind and the tkg cluster.
The error is still there.
1.4 compared to 1.3.1 is very poor all around.
Events:
Type Reason Age From Message
—- —— —- —- ——-
Normal Scheduled 10m default-scheduler Successfully assigned tkg-system/tanzu-addons-controller-manager-5f46c75b9d-k5rbd to mgmt-cluster-md-0-6f496574bd-qwz8d
Normal Pulling 10m kubelet Pulling image “projects.registry.vmware.com/tkg/tanzu_core/addons/tanzu-addons-manager@sha256:284fc8048f9760ce35ed6267cfeaedada84c2e90b016ebc52e73ba8197b9ca2c”
Normal Pulled 10m kubelet Successfully pulled image “projects.registry.vmware.com/tkg/tanzu_core/addons/tanzu-addons-manager@sha256:284fc8048f9760ce35ed6267cfeaedada84c2e90b016ebc52e73ba8197b9ca2c” in 3.447108498s
Normal Created 9m12s (x4 over 10m) kubelet Created container tanzu-addons-controller
Normal Started 9m12s (x4 over 10m) kubelet Started container tanzu-addons-controller
Normal Pulled 9m12s (x3 over 10m) kubelet Container image “projects.registry.vmware.com/tkg/tanzu_core/addons/tanzu-addons-manager@sha256:284fc8048f9760ce35ed6267cfeaedada84c2e90b016ebc52e73ba8197b9ca2c” already present on machine
Warning BackOff 5m9s (x31 over 9m57s) kubelet Back-off restarting failed container
mel@bootstrap:~$ kubectl –kubeconfig /home/mel/.kube/config logs -n tkg-system tanzu-addons-controller-manager-5f46c75b9d-k5rbd
I1220 23:00:44.220107 1 request.go:645] Throttling request took 1.006869282s, request: GET:https://100.64.0.1:443/apis/kappctrl.k14s.io/v1alpha1?timeout=32s
I1220 23:00:44.843036 1 main.go:134] setup “msg”=”starting manager”
I1220 23:00:44.843491 1 controller.go:158] controller-runtime/manager/controller/cluster “msg”=”Starting EventSource” “reconciler group”=”cluster.x-k8s.io” “reconciler kind”=”Cluster” “source”={“Type”:{“metadata”:{“creationTimestamp”:null},”spec”:{“controlPlaneEndpoint”:{“host”:””,”port”:0}},”status”:{“infrastructureReady”:false,”controlPlaneInitialized”:false}}}
E1220 23:00:48.072346 1 source.go:117] controller-runtime/source “msg”=”if kind is a CRD, it should be installed before calling Start” “error”=”no matches for kind \”Cluster\” in version \”cluster.x-k8s.io/v1alpha3\”” “kind”={“Group”:”cluster.x-k8s.io”,”Kind”:”Cluster”}
E1220 23:00:48.072674 1 internal.go:547] controller-runtime/manager “msg”=”error received after stop sequence was engaged” “error”=”context canceled”
E1220 23:00:48.072677 1 main.go:136] setup “msg”=”problem running manager” “error”=”no matches for kind \”Cluster\” in version \”cluster.x-k8s.io/v1alpha3\””
After spending a considerable amount of time on this I have concluded that when using photon OS you hit all kinds of problems (at least with using an ubuntu vm as a bootstrap, I have not used a centos/rhel/fedora yet). When you use the ubuntu image for the cluster, then mgmt cluster spawns fine with no issues, even on the bear minimum of 2 cpus for each master and worker.
OS_NAME: ubuntu
OS_VERSION: “20.04”
That is interesting and I would highly recommend getting your results (with logs and such) to VMware via a support request. My work on 1.4 was only with Ubuntu OS images from an Ubuntu bootstrap VM. For all prior versions of TKG, I used Photon OS images, also with an Ubuntu bootstrap VM, with no issues.
and it finally finished installing but with errors
Failure while waiting for package ‘antrea’
packageinstalls.packaging.carvel.dev “vsphere-csi” not found, retrying
Failure while waiting for package ‘vsphere-csi’
packageinstalls.packaging.carvel.dev “ako-operator” not found, retrying
Failure while waiting for package ‘ako-operator’
packageinstalls.packaging.carvel.dev “metrics-server” not found, retrying
Failure while waiting for package ‘metrics-server’
packageinstalls.packaging.carvel.dev “vsphere-cpi” not found, retrying
Failure while waiting for package ‘vsphere-cpi’
Hi Chris,
sadly this is just my homelab, my licenses are via VMUG, for production it’s out of my hands and has telco premier support anyway. It would be interesting to see when production moves from 1.3.1 to 1.4 if there will be upgrades of the existing 1.3.1 clusters to 1.4 or a new cluster and CNFs re-instantiated via TCA to the new cluster.
I am installing tkg on vsphere 7 with windows ad as the ldap. During the management cluster installation, the verify ldap config succeeds but when I am trying to login with my ldap user credit its showing invalid username and password.(Although the user creds are correct).
The dex logs also give the same error. Nothing much help from the logs as well.
Any help would be much appreciated!!!
Thanks in advance
Off the top of my head, I would make sure that the format used to enter the username (UPN vs. DOMAIN/username) is entered properly. I recall that I was only able to get it working with UPN. If that doesn’t pan out, I would recommend opening a support request with VMware to help diagnose it.
Thanks for the reply but I was able to get through the above error after tweaking the dex config map w.r.t Windows AD parameters. But now after passing the user credentials in tkg web page the curl to callback URL is failing with Connection refused.
Is there a way to debug this issue?
Logs of dex pod:
===========
{“level”:”info”,”msg”:”performing ldap search cn=Users,dc=xxxxxx,dc=xxxx sub (\u0026(objectClass=person)(userPrincipalName=xxxxxxxxxx))”,”time”:”2022-02-07T15:16:04Z”}
{“level”:”info”,”msg”:”username \”xxxxxxxxxxxxx\” mapped to entry CN=xxxxxxxxxxx,CN=Users,DC=xxxxxxxxxx,DC=xxx”,”time”:”2022-02-07T15:16:04Z”}
{“level”:”info”,”msg”:”performing ldap search OU=xxxxxx,DC=xxxxxxxxx,DC=xxx sub (\u0026(objectClass=group)(member=CN=xxxxxxxxx,CN=Users,DC=xxxxxxxx,DC=xxx))”,”time”:”2022-02-07T15:16:04Z”}
{“level”:”info”,”msg”:”login successful: connector \”ldap\”, username=\”CN=xxxxxxx,CN=Users,DC=xxxxx,DC=xxxxx\”, preferred_username=\”\”, email=\”CN=xxxxxxx,CN=Users,DC=xxxxxxx,DC=xxxx\”, groups=[\”CN=xxxxxxx,OU=xxxxxxx,DC=xxxxxxx,DC=xxxxx\”]”,”time”:”2022-02-07T15:16:04Z”}
Curl output:
=========
curl -L ‘http://127.0.0.1:37911/callback?code=Jt1exfXLUNf2quOWsoN5OEWAdj7_zL1CJYp9k42wrz0.WNIUJV3ZxPjJ8ZXpH0wWSulBr5sMO0F4NTBASDmAi7c&scope=openid+offline_access+pinniped%3Arequest-audience&state=e7b79c3b14f355964fbc886385aa0a63’
curl: (7) Failed to connect to 127.0.0.1 port 37911: Connection refused