Before TKG 1.4, when you provisioned workload clusters on vSphere, the nodes mostly ended up randomly spread out across the available ESXi hosts. This wouldn’t provide the best experience when you’re planning for DR scenarios and want to ensure some redundancy. There is now the ability to spread nodes across multiple clusters within a single datacenter or across multiple hosts groups within a single cluster. Since my nested lab environment is incredibly small, I’m going though the option of placing nodes in multiple host groups. The configuration for this is not trivial as you’ll see but I will try to lay it out thoroughly and explain what’s being done along the way.
Note: This is an experimental feature in TKG 1.4 so you should not configure a production workload cluster in this fasion.
VMware has the process for both methods documented at Spread Nodes Across Multiple Compute Clusters in a Datacenter and Spread Nodes Across Multiple Hosts in a Single Compute Cluster.
Table of Contents
Create VM/Host Groups and Affinity Rules
The very first thing to do is to create vm groups (for the nodes), host groups (where the nodes will run) and affinity rules (to keep the nodes in a group on the hosts in a group). This is easily done in the vSphere UI from the cluster’s Configure > VM/Host Groups page.

Click the Add button to create a new group.
Give the group a name (HG1 in this example) and specify the group type (Host Group in this example).

Click the Add button to add ESXi hosts to the host group. I’m picking the first two hosts in my cluster for this group.

Click the OK button and you should see a summary of the new host group.

Click the OK button again. You can now see the host group and it’s members on the VM/Host Groups page.
Create any additional host groups that you will need (I only created one additional host group with the other two ESXi hosts in my cluster). The number of host groups should match the number of availability zones you plan to use.
Click the Add button again to start the process of creating your first VM group.
Give the VM group a name (VMG1 in this example) and set the Type to VM Group.

This is the point where I hit a small snag. I didn’t have any VMs that I wanted in this VM group as they had not been created yet. I tried clicking OK but was prevented from proceeding by the following error: A group must have members. Add a group member. I quickly created a dummy vm (dummy1) and then clicked the Add button here to add it to the group.

Click the OK button and you should see a summary of the new VM group.

Click the OK button again. You can now see the VM group and it’s member on the VM/Host Groups page.
Repeat this process for however many VM groups you need (I only created one additional VM group with the same dummy1 VM in it). The number of VM groups should match the number of host groups.
I did find out later that you can use govc
to create empty VM groups. The syntax for the command would look similar to the following:
govc cluster.group.create -cluster=RegionA01-MGMT -name=VMG1 -vm
This could also be used to create any host groups by replacing -vm
with -host
and specifying the hosts to add, similar to the following:
govc cluster.group.create -cluster=RegionA01-MGMT -name=VMG1 -host esx-01a.corp.tanzu esx-02a.corp.tanzu
I also tried this via PowerCLI but found that the same restriction of needing to specify a VM name during creation was there also. You could use PowerCLI to create the host groups since there are hosts to be added…the command would look similar to the following:
New-DrsClusterGroup -name "VG1" -cluster "RegionA01-MGMT" -vmhost "esx-01a.corp.tanzu", "esx-02a.corp.tanzu"
Moving on, affinity rules (VM/Host rules) were needed to pair up the VM groups with their appropriate host groups. You can start this process on the cluster’s Configure > VM/Host Rules page.
Click the Add button to create a new rule.
Give the rule a name (AZ1 in this example) and set the Type to Virtual Machines to Hosts. Set the VM group to the first VM group you created (VMG1 in this example), and the rule to Must run on hosts in group. Set the Host group to the first host group you created (HG1 in this example).

Click the OK button and you should now see the rule on the VM/Host Rules page.
Repeat this process for any additional rules you need to create (I only needed one more).


If your rules and groups are all okay, you can move on to creating and assigning tags.
Note: If you go the route of creating everything via the UI with a dummy VM in your VM groups, you can delete this VM (or just remove it from the VM groups) once your cluster is up and running.
Create and assign tags to your cluster and hosts.
vSphere tags are going to be used to identify the cluster and hosts where the Kubernetes nodes are placed. You can see that there are no tags assigned to my cluster currently (in the Tags pane on the cluster’s Summary page).
Click the Assign button in the Tags pane.

I’ve only got the one tag present (k8s-storage) that I’m using to identify my NFS datastore for inclusion in a particular storage policy. You can see this same tag from the Tags & Custom Attributes page:

Click the New link to create a new tag.
The first tag to create will go on the selected cluster and correlate to the concept of a region, so give it an appropriate name (lab in this example). You will likely need to create a new tag category as well. You can see in this screenshot that I already had one named k8s but this was also used for storage purposes.

Click the Create New Category link.
Give the category an appropriate name (k8s-region in this example). You can leave all other items as-is or leave only the appropriate Associable Object Type selected (Cluster in this example).

Click the Create button.
You should end up back on the Create Tag page where the newly created tag category is populated in the Category field.

Click the Create button here and you should be sent back to the Assign Tag page.

Select the newly created tag and click the Assign button.
You’ll now see the tag (lab in this example) in the Tags pane for the cluster.
This same process now needs to be repeated to create the availability zone tags which will be placed on the ESXi hosts.
Select the first ESXi host in the cluster that will be part of an availability zone (esx-01a.corp.tanzu in this example). You can see from the Tags pane on the Summary tab for this host that no tags are currently present.

Click on the Assign link to start the process of creating a new tag.
You can see the new lab tag that was just created as well as the original k8s-storage tag.

Click the Add Tag link.
This tag will be placed on the selected host and correlate to the concept of a zone, so give it an appropriate name (AZ1 in this example). As with the region tag, you will likely need to create a new tag category for your zone tags.

Click the Create New Category link.
Give the category an appropriate name (k8s-zone in this example). You can leave all other items as-is or leave only the appropriate Associable Object Type selected (Host in this example).

You should end up back on the Create Tag page where the newly created tag category is populated in the Category field.

Click the Create button here and you should be sent back to the Assign Tag page.

Select the newly created tag and click the Assign button.
You’ll now see the tag (lab in this example) in the Tags pane for the host.

You’ll need to repeat this process in a limited fashion…add the same tag to other hosts in the same availability zone but create new tags for hosts in other availability zones. You do not need to create any more zone-based tag categories though…they will all fall under the first one you created (k8s-zone in this example).
You can validate what you’ve configured from the Tags and Custom Attributes page:

Clicking on any of the tags will let you drill down and see what objects have that tag on them:



As you might have suspected, you can complete these tasks from the command line as well…but there is a small caveat. Using the PowerCLI cmdlet New-TagCategory
or the govc tags.category.create
command will to create the tag categories but you have to manually specify the objects to which they apply. When you create tag categories in the UI, they apply to all objects by default. This is arguably overkill but does make it much harder to make a mistake with your tag category definition. When assigning a tag to a resource using govc
, you also need to know the full path to that resource (/RegionA01/host/RegionA01-MGMT/esx-01a.corp.tanzu
for one my hosts as an example).
Create a region-based tag category, a tag under that category, and assign the tag to a cluster with govc
:
govc tags.category.create -t ClusterComputeResource k8s-region
govc tags.create -c k8s-region lab
govc tags.attach lab /RegionA01/host/RegionA01-MGMT
Create a zone-based tag category, a tag under that category, and assign the tag to a host with govc
:
govc tags.category.create -t HostSystem k8s-zone
govc tags.create -c k8s-zone AZ1
govc tags.attach AZ1 /RegionA01/host/RegionA01-MGMT/esx-01a.corp.tanzu
Create a region-based tag category, a tag under that category, and assign the tag to a cluster with PowerCLI:
New-TagCategory -Name "k8s-region" -EntityType "ClusterComputeResource"
Name Cardinality Description
---- ----------- -----------
k8s-region Single
New-Tag -Category "k8s-region" -Name "lab2"
Name Category Description
---- -------- -----------
lab k8s-region
Get-Cluster RegionA01-MGMT | New-TagAssignment -Tag lab
Tag Entity
--- ------
k8s-region/lab RegionA01-MGMT
Create a zone-based tag category, a tag under that category, and assign the tag to a host with PowerCLI:
New-TagCategory -Name k8s-zone -EntityType "HostSystem"
Name Cardinality Description
---- ----------- -----------
k8s-zone Single
New-Tag -Category "k8s-zone" -Name "AZ1"
Name Category Description
---- -------- -----------
AZ1 k8s-zone
Get-VMHost esx-01a.corp.tanzu | New-TagAssignment -Tag AZ1
Tag Entity
--- ------
k8s-zone/AZ1 esx-01a.corp.tanzu
Define your failure domains and deployment domains
The next step is to define your VSphereFailureDomain and VSphereDeploymentZone objects. These are custom resource definitions (CRDs) that exist as part of Cluster API on vSphere (CAPV). You can read a bit more about these objects at CAPV ControlPlane Failure Domain.
The following is a sample specification that defines two VSphereFailureDomain objects and two VSphereDeploymentZone objects. These will directly correspond to the two availability zones (AZ1 and AZ2) I defined via the tags and VM/Host groups earlier.
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereFailureDomain
metadata:
name: az1
spec:
region:
name: lab
type: ComputeCluster
tagCategory: k8s-region
zone:
name: AZ1
type: HostGroup
tagCategory: k8s-zone
topology:
datacenter: RegionA01
computeCluster: RegionA01-MGMT
hosts:
vmGroupName: VMG1
hostGroupName: HG1
datastore: map-vol
networks:
- K8s-Workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereFailureDomain
metadata:
name: az2
spec:
region:
name: lab
type: ComputeCluster
tagCategory: k8s-region
zone:
name: AZ2
type: HostGroup
tagCategory: k8s-zone
topology:
datacenter: RegionA01
computeCluster: RegionA01-MGMT
hosts:
vmGroupName: VMG2
hostGroupName: HG2
datastore: map-vol
networks:
- K8s-Workload
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereDeploymentZone
metadata:
name: az1
spec:
server: vcsa-01a.corp.tanzu
failureDomain: az1
placementConstraint:
resourcePool:
folder:
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereDeploymentZone
metadata:
name: az2
spec:
server: vcsa-01a.corp.tanzu
failureDomain: az2
placementConstraint:
resourcePool:
folder:
Some very important things to point out here:
- For VSphereFailureDomain objects, everything under
spec.region
,spec.zone
andspec.topology
must match what you have configured in vCenter. In this case:spec.region.name
= lab, which is the name of the region-base tag assigned to my clusterspec.region.type
= ComputeCluster, which is saying that the region corresponds to a vSphere clusterspec.region.tagCategory
= k8s-region, which is the tag category that the “lab” region-based tag falls underspec.zone.name
= AZ1, which is the tag name for the hosts in the first host group created, which encompasses the esx-01a and esx-02a hosts in this example (the second VSphereFailureDomain uses AZ2 for this value)spec.zone.type
= HostGroup, which is saying that the zone corresponds to a vSphere host groupspce.zone.tagCategory
= k8s-zone, which is the tag category that the “AZ1” and “AZ2” zone-based tags fall underspec.topology.datacenter
= RegionA01, which is the vSphere datacenter objectspec.topology.computCluster
= RegionA01-MGMT, which is the vSphere cluster objectspec.topology.hosts.vmGroupName
= VMG1, which is the first VM group created and aligns with the HG1 host group (the second VSphereFailureDomain uses VMG2 for this value)spec.topology.hosts.hostGroupName
= HG1, which is the first host group created and aligns with VMG1 VM group (the second VSphereFailureDomain uses HG2 for this value)spec.topology.datastore
= map-vol, this just needs to map to a datastore name where VMs on the specified hosts can residespec.topology.networks
= K8s-Workload, this just needs to map to a network which the VMs on the specified hosts can use
Both the datastore and networks values should also align with the values you use when creating your workload clusters - For VSphereDeploymentZone objects, the
spec.failuredomain
value must match one of themetadata.name value
s of the VSphereFailureDomain definitions…i.e. the first VSphereDeploymentZone has aspec.failuredomain
value of az1, which corresponds to themetadata.name
value for the first VSphereFailureDomain. The same holds true for the second VSphereDeploymentZone, az2. - The
spec.server
value in the VSphereDeploymentZone objects must exactly match the vCenter Server address (IP or FQDN) as it was entered for the VCENTER SERVER value on the IaaS Provider page of the installer UI or theVSPHERE_SERVER
parameter if you used a configuration file. If it these do not match, the control plane nodes in your workload cluster will not be placed into availability zones.
When you’re happy with your VSphereFailureDomain and VSphereDeploymentZone specification file, you can apply it to create the objects:
kubectl apply -f vsphere-zones.yaml
vspherefailuredomain.infrastructure.cluster.x-k8s.io/az1 created
vspherefailuredomain.infrastructure.cluster.x-k8s.io/az2 created
vspheredeploymentzone.infrastructure.cluster.x-k8s.io/az1 created
vspheredeploymentzone.infrastructure.cluster.x-k8s.io/az2 created
You can inspect the objects to ensure that they are configured as desired:
kubectl describe vspherefailuredomain az1
Name: az1
Namespace:
Labels: <none>
Annotations: <none>
API Version: infrastructure.cluster.x-k8s.io/v1alpha3
Kind: VSphereFailureDomain
Metadata:
Creation Timestamp: 2021-10-11T20:15:37Z
...
Resource Version: 55786
UID: 5eeffe54-81fa-4a67-bd44-f796ba891826
Spec:
Region:
Auto Configure: false
Name: lab
Tag Category: k8s-region
Type: ComputeCluster
Topology:
Compute Cluster: RegionA01-MGMT
Datacenter: RegionA01
Datastore: map-vol
Hosts:
Host Group Name: HG1
Vm Group Name: VMG1
Networks:
K8s-Workload
Zone:
Auto Configure: false
Name: AZ1
Tag Category: k8s-zone
Type: HostGroup
Events: <none>
kubectl describe vspheredeploymentzone az1
Name: az1
Namespace:
Labels: <none>
Annotations: <none>
API Version: infrastructure.cluster.x-k8s.io/v1alpha3
Kind: VSphereDeploymentZone
Metadata:
Creation Timestamp: 2021-10-11T20:10:22Z
...
Resource Version: 53932
UID: a072f2f2-057a-4ae2-b374-b9321517cf2d
Spec:
Control Plane: true
Failure Domain: AZ1
Placement Constraint:
Folder:
Resource Pool:
Server: vcsa-01a.corp.tanzu
Events: <none>
Add an overlay to modify the MachineDeployment, VSphereMachineTemplate and KubeadmConfigTemplate objects.
By default, when deploying a TKG 1.4 workload cluster, you’ll have a MachineDeployment, which will define the configuration for the Machines (since this is vSphere, these are VMs), a VSphereMachineTemplate, which defines the VSphereMachine objects, and is referenced in the spect.template.spec.infrastructureRef.kind section of the MachineDeployment object, and a KubeadmConfigTemplate, which defines how kubeadm should build out the worker nodes and join them to the cluster…and many other objects of course but these are the ones we’re worried about right now. These will all need to be modified such that they will work with our newly defined VSphereFailureDomain and VSphereDeploymentZone objects.
You will need to add content similar to the following to the ~/.config/tanzu/tkg/providers/infrastructure-vsphere/ytt/vsphere-overlay.yaml
file.
#! Please add any overlays specific to vSphere provider under this file.
#@ load("@ytt:overlay", "overlay")
#@ load("@ytt:data", "data")
#@overlay/match by=overlay.subset({"kind":"MachineDeployment", "metadata":{"name": "{}-md-0".format(data.values.CLUSTER_NAME)}})
---
spec:
template:
spec:
#@overlay/match missing_ok=True
failureDomain: az1
infrastructureRef:
name: #@ "{}-worker-0".format(data.values.CLUSTER_NAME)
---
#@overlay/match by=overlay.subset({"kind":"VSphereMachineTemplate", "metadata":{"name": "{}-worker".format(data.values.CLUSTER_NAME)}})
---
metadata:
name: #@ "{}-worker-0".format(data.values.CLUSTER_NAME)
spec:
template:
spec:
#@overlay/match missing_ok=True
failureDomain: az1
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
metadata:
name: #@ "{}-worker-1".format(data.values.CLUSTER_NAME)
spec:
template:
spec:
cloneMode: #@ data.values.VSPHERE_CLONE_MODE
datacenter: #@ data.values.VSPHERE_DATACENTER
datastore: #@ data.values.VSPHERE_DATASTORE
storagePolicyName: #@ data.values.VSPHERE_STORAGE_POLICY_ID
diskGiB: #@ data.values.VSPHERE_WORKER_DISK_GIB
folder: #@ data.values.VSPHERE_FOLDER
memoryMiB: #@ data.values.VSPHERE_WORKER_MEM_MIB
network:
devices:
#@ if data.values.TKG_IP_FAMILY == "ipv6":
#@overlay/match by=overlay.index(0)
#@overlay/replace
- dhcp6: true
networkName: #@ data.values.VSPHERE_NETWORK
#@ else:
#@overlay/match by=overlay.index(0)
#@overlay/replace
- dhcp4: true
networkName: #@ data.values.VSPHERE_NETWORK
#@ end
numCPUs: #@ data.values.VSPHERE_WORKER_NUM_CPUS
resourcePool: #@ data.values.VSPHERE_RESOURCE_POOL
server: #@ data.values.VSPHERE_SERVER
template: #@ data.values.VSPHERE_TEMPLATE
failureDomain: az2
---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
kind: KubeadmConfigTemplate
metadata:
name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
namespace: '${ NAMESPACE }'
spec:
template:
spec:
useExperimentalRetryJoin: true
joinConfiguration:
nodeRegistration:
criSocket: /var/run/containerd/containerd.sock
kubeletExtraArgs:
cloud-provider: external
tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
name: '{{ ds.meta_data.hostname }}'
preKubeadmCommands:
- hostname "{{ ds.meta_data.hostname }}"
- echo "::1 ipv6-localhost ipv6-loopback" >/etc/hosts
- echo "127.0.0.1 localhost" >>/etc/hosts
- echo "127.0.0.1 {{ ds.meta_data.hostname }}" >>/etc/hosts
- echo "{{ ds.meta_data.hostname }}" >/etc/hostname
files: []
users:
- name: capv
sshAuthorizedKeys:
- '${ VSPHERE_SSH_AUTHORIZED_KEY }'
sudo: ALL=(ALL) NOPASSWD:ALL
---
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineDeployment
metadata:
labels:
cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
spec:
clusterName: #@ data.values.CLUSTER_NAME
replicas: #@ data.values.WORKER_MACHINE_COUNT
selector:
matchLabels:
cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
template:
metadata:
labels:
cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
node-pool: #@ "{}-worker-pool".format(data.values.CLUSTER_NAME)
spec:
bootstrap:
configRef:
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
kind: KubeadmConfigTemplate
name: #@ "{}-md-1".format(data.values.CLUSTER_NAME)
clusterName: #@ data.values.CLUSTER_NAME
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
name: #@ "{}-worker-1".format(data.values.CLUSTER_NAME)
version: #@ data.values.KUBERNETES_VERSION
failureDomain: az2
#@overlay/match by=overlay.subset({"kind":"KubeadmConfigTemplate"}), expects="1+"
---
spec:
template:
spec:
users:
#@overlay/match by=overlay.index(0)
#@overlay/replace
- name: capv
sshAuthorizedKeys:
- #@ data.values.VSPHERE_SSH_AUTHORIZED_KEY
sudo: ALL=(ALL) NOPASSWD:ALL
To explain what’s going on here…
#@overlay/match by=overlay.subset({"kind":"MachineDeployment", "metadata":{"name": "{}-md-0".format(data.values.CLUSTER_NAME)}})
---
spec:
template:
spec:
#@overlay/match missing_ok=True
failureDomain: az1
infrastructureRef:
name: #@ "{}-worker-0".format(data.values.CLUSTER_NAME)
This stanza is taking the default MachineDeployment, which is named <cluster-name>-md-0, and adding the spec.template.spec.failureDomain: az1
value and also renaming it to <cluster-name>-worker-0. The renaming is to align with the naming of the vSphereMachineTemplate.
#@overlay/match by=overlay.subset({"kind":"VSphereMachineTemplate", "metadata":{"name": "{}-worker".format(data.values.CLUSTER_NAME)}})
---
metadata:
name: #@ "{}-worker-0".format(data.values.CLUSTER_NAME)
spec:
template:
spec:
#@overlay/match missing_ok=True
failureDomain: az1
This stanza is taking the default VSphereMachineTemplate, which is named <cluster-name>-worker, and renaming it to <cluster-name>-worker-0 (since we are going to have more than one now) and adding the spec.template.spec.failuredomain: az1
value.
The next three stanzas are creating additional VSphereMachineTemplate, KubeadmConfigTemplate and MachineDeployment objects named <cluster-name>-worker-1, <cluster-name>-md-1 and <cluster-name>-md-1 respectively. The reason for this is that we need to assign a set of nodes to the second availability zone, az2, so we need separate configuration definition sections for each. If you were to make use of a third availability zone, you would need to create another set of these three stanzas, changing the naming end in -2 and setting the availability zone to az3.
The very last stanza is configuring sudo and the ssh authorized key for the capv user on the worker nodes and does not need to be modified or duplicated.
Create a workload cluster
You should be at a point where you can create a specification for a workload cluster and apply it. The following is what I have used and you can see that there isn’t a lot there.
DEPLOY_TKG_ON_VSPHERE7: true
CLUSTER_CIDR: 100.96.0.0/11
SERVICE_CIDR: 100.64.0.0/13
CLUSTER_NAME: tkg-wld
CLUSTER_PLAN: prod
IDENTITY_MANAGEMENT_TYPE: ldap
INFRASTRUCTURE_PROVIDER: vsphere
NAMESPACE: default
CNI: antrea
ENABLE_MHC: "false"
MHC_UNKNOWN_STATUS_TIMEOUT: 5m
MHC_FALSE_STATUS_TIMEOUT: 5m
OS_NAME: ubuntu
VSPHERE_CONTROL_PLANE_DISK_GIB: "20"
VSPHERE_CONTROL_PLANE_ENDPOINT: 192.168.220.129
VSPHERE_CONTROL_PLANE_MEM_MIB: "4096"
VSPHERE_CONTROL_PLANE_NUM_CPUS: "2"
VSPHERE_DATACENTER: /RegionA01
VSPHERE_DATASTORE: /RegionA01/datastore/map-vol
VSPHERE_FOLDER: /RegionA01/vm
VSPHERE_NETWORK: K8s-Workload
VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>
VSPHERE_RESOURCE_POOL: /RegionA01/host/RegionA01-MGMT/Resources
VSPHERE_STORAGE_POLICY_ID: k8s-policy
VSPHERE_SERVER: vcsa-01a.corp.tanzu
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQC5KYNeWQgVHrDHaEhBCLF1vIR0OAtUIJwjKYkY4E/5HhEu8fPFvBOIHPFTPrtkX4vzSiMFKE5WheKGQIpW3HHlRbmRPc9oe6nNKlsUfFAaJ7OKF146Gjpb7lWs/C34mjdtxSb1D/YcHSyqK5mxhyHAXPeK8lrxG5MLOJ3X2A3iUvXcBo1NdhRdLRWQmyjs16fnPx6840x9n5NqeiukFYIVhDMFErq42AkeewsWcbZQuwViSLk2cIc09eykAjaXMojCmSbjrj0kC3sbYX+HD2OWbKohTqqO6/UABtjYgTjIS4PqsXWk63dFdcxF6ukuO6ZHaiY7h3xX2rTg9pv1oT8WBR44TYgvyRp0Bhe0u2/n/PUTRfp22cOWTA2wG955g7jOd7RVGhtMHi9gFXeUS2KodO6C4XEXC7Y2qp9p9ARlNvu11QoaDyH3l0h57Me9we+3XQNuteV69TYrJnlgWecMa/x+rcaEkgr7LD61dY9sTuufttLBP2ro4EIWoBY6F1Ozvcp8lcgi/55uUGxwiKDA6gQ+UA/xtrKk60s6MvYMzOxJiUQbWYr3MJ3NSz6PJVXMvlsAac6U+vX4U9eJP6/C1YDyBaiT96cb/B9TkvpLrhPwqMZdYVomVHsdY7YriJB93MRinKaDJor1aIE/HMsMpbgFCNA7mma9x5HS/57Imw==
admin@corp.local
VSPHERE_TLS_THUMBPRINT: 01:8D:8B:7F:13:3A:B9:C6:90:D2:5F:17:AD:EB:AC:78:26:3C:45:FB
VSPHERE_USERNAME: administrator@vsphere.local
VSPHERE_WORKER_DISK_GIB: "20"
VSPHERE_WORKER_MEM_MIB: "8192"
VSPHERE_WORKER_NUM_CPUS: "8"
VSPHERE_REGION: k8s-region
VSPHERE_ZONE: k8s-zone
ENABLE_AUDIT_LOGGING: false
ENABLE_DEFAULT_STORAGE_CLASS: true
WORKER_MACHINE_COUNT: 2
AVI_CONTROL_PLANE_HA_PROVIDER: "true"
The VSPHERE_REGION
must be set to the region-based tag category (k8s-region in this example) and the VSPHERE_ZONE
must be set to the zone-based tag category (k8s-zone in this example).
It’s very important to note that the WORKER_MACHINE_COUNT
value is per availability zone. By specifying a count of 2 for this setting, I will end up with four total worker nodes since I have two availability zones.
One other thing to be aware of but not directly related to topic of availability zones is that I have set AVI_CONTROL_PLANE_HA_PROVIDER
to true and have specified a VSPHERE_CONTROL_PLANE_ENDPOINT
value (192.168.220.129). This means that an IP has to be reserved in NSX Advanced Load Balancer (NSX ALB). You can read a little more about this topic in my earlier post, Migrating a TKG cluster control-plane endpoint from kube-vip to NSX-ALB.
From the Infrastructure, Networks page in the NSX ALB UI, you’ll need to click the Edit icon for the network where you want your control plane endpoint IP address to live (K8s-Frontend in my case).
Click the Edit button next to the appropriate network (192.168.220.0/23 in this case).
Click the Add Static IP Address Pool button and enter the desired control plane endpoint IP address as a range (192.168.220.129-1921.68.220.129 in this example).
Click the Save button.
Click the Save button again. You should see a summary of the networks and can see that there is now an additional Static IP Pool configured on the desired network (K8s-Frontend has three now).

Before you can issue the command to create the workload cluster you’ll need to know what available Kubernetes versions you have:
kubectl get tkr
NAME VERSION COMPATIBLE CREATED
v1.17.16---vmware.2-tkg.1 v1.17.16+vmware.2-tkg.1 False 19h
v1.17.16---vmware.2-tkg.2 v1.17.16+vmware.2-tkg.2 False 19h
v1.17.16---vmware.3-tkg.1 v1.17.16+vmware.3-tkg.1 False 19h
v1.18.16---vmware.1-tkg.1 v1.18.16+vmware.1-tkg.1 False 19h
v1.18.16---vmware.1-tkg.2 v1.18.16+vmware.1-tkg.2 False 19h
v1.18.16---vmware.3-tkg.1 v1.18.16+vmware.3-tkg.1 False 19h
v1.18.17---vmware.1-tkg.1 v1.18.17+vmware.1-tkg.1 False 19h
v1.18.17---vmware.2-tkg.1 v1.18.17+vmware.2-tkg.1 False 19h
v1.19.12---vmware.1-tkg.1 v1.19.12+vmware.1-tkg.1 True 19h
v1.19.8---vmware.1-tkg.1 v1.19.8+vmware.1-tkg.1 False 19h
v1.19.8---vmware.1-tkg.2 v1.19.8+vmware.1-tkg.2 False 19h
v1.19.8---vmware.3-tkg.1 v1.19.8+vmware.3-tkg.1 False 19h
v1.19.9---vmware.1-tkg.1 v1.19.9+vmware.1-tkg.1 False 19h
v1.19.9---vmware.2-tkg.1 v1.19.9+vmware.2-tkg.1 False 19h
v1.20.4---vmware.1-tkg.1 v1.20.4+vmware.1-tkg.1 False 19h
v1.20.4---vmware.1-tkg.2 v1.20.4+vmware.1-tkg.2 False 19h
v1.20.4---vmware.3-tkg.1 v1.20.4+vmware.3-tkg.1 False 19h
v1.20.5---vmware.1-tkg.1 v1.20.5+vmware.1-tkg.1 False 19h
v1.20.5---vmware.2-fips.1-tkg.1 v1.20.5+vmware.2-fips.1-tkg.1 False 19h
v1.20.5---vmware.2-tkg.1 v1.20.5+vmware.2-tkg.1 False 19h
v1.20.8---vmware.1-tkg.2 v1.20.8+vmware.1-tkg.2 True 19h
v1.21.2---vmware.1-tkg.1 v1.21.2+vmware.1-tkg.1 True 19h
The node image I deployed is v1.21.2 so I’ll be using v1.21.2—vmware.1-tkg.1.
tanzu cluster create -f tkg-wld-cluster.yaml --tkr v1.21.2---vmware.1-tkg.1 -v 6
compatibility file (/home/ubuntu/.config/tanzu/tkg/compatibility/tkg-compatibility.yaml) already exists, skipping download
BOM files inside /home/ubuntu/.config/tanzu/tkg/bom already exists, skipping download
Using namespace from config:
Validating configuration...
Waiting for resource pinniped-info of type *v1.ConfigMap to be up and running
Creating workload cluster 'tkg-wld'...
patch cluster object with operation status:
{
"metadata": {
"annotations": {
"TKGOperationInfo" : "{\"Operation\":\"Create\",\"OperationStartTimestamp\":\"2021-10-12 13:06:28.927801225 +0000 UTC\",\"OperationTimeout\":1800}",
"TKGOperationLastObservedTimestamp" : "2021-10-12 13:06:28.927801225 +0000 UTC"
}
}
}
Waiting for cluster to be initialized...
zero or multiple KCP objects found for the given cluster, 0 tkg-wld default, retrying
[cluster control plane is still being initialized, cluster infrastructure is still being provisioned], retrying
Right away, you’ll see a new virtual service created in NSX ALB for the workload cluster’s control plane endpoint (192.168.220.129):

It’s not in a functional state as there is really nothing backing it yet since the control plane node is not up and running.
And you can see this as a Kubernetes service from the management cluster context:
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default-tkg-wld-control-plane LoadBalancer 100.64.12.145 192.168.220.129 6443:30498/TCP 3h21m
kubernetes ClusterIP 100.64.0.1 <none> 443/TCP 24h
The first control plane VM should be deployed and powered on fairly quickly.

Once kubeadm has had a chance to run on this node and configure the requisite Kubernetes processes, the new virtual service in NSX ALB should move to a healthy state.

Once the first control plane node is up and running you will see a lot more activity in the vSphere Client as additional control plane and worker nodes are created:
Once the second control plane node is functional you should see the virtual service in NSX ALB updated to show both control plane nodes:


Per the earlier statement around having a total of four worker nodes, you can see that in the vSphere Client now:

You might also notice that they are spread across the nodes as desired…the first two worker nodes are on host esx-01a.corp.tanzu, which is in AZ1 and the second two nodes are on esx-03a.corp.tanzu, which are in AZ2.
If you really want to keep a close eye on things you can ssh to one of the control plane nodes (as the capv user with the ssh key specified in the cluster configuration):
kubectl --kubeconfig=/etc/kubernetes/admin.conf get nodes
NAME STATUS ROLES AGE VERSION
tkg-wld-control-plane-cxh9l Ready control-plane,master 29m v1.21.2+vmware.1
tkg-wld-control-plane-rb2nj Ready control-plane,master 13m v1.21.2+vmware.1
tkg-wld-md-0-847b7d8cf9-26qff Ready <none> 16m v1.21.2+vmware.1
tkg-wld-md-0-847b7d8cf9-pksnq Ready <none> 14m v1.21.2+vmware.1
tkg-wld-md-1-85cc664768-llqkt Ready <none> 13m v1.21.2+vmware.1
tkg-wld-md-1-85cc664768-zpzjr Ready <none> 22m v1.21.2+vmware.1
At this point, two of the control plane nodes were functional, as were all four worker nodes.
The third control plane node was online shortly afterwards:

And the virtual service in NSX ALB finally showed all three control plane nodes:

The cluster recognized all of the control plane nodes as well:
kubectl --kubeconfig=/etc/kubernetes/admin.conf get nodes
NAME STATUS ROLES AGE VERSION
tkg-wld-control-plane-cxh9l Ready control-plane,master 43m v1.21.2+vmware.1
tkg-wld-control-plane-lhw6j Ready control-plane,master 2m7s v1.21.2+vmware.1
tkg-wld-control-plane-rb2nj Ready control-plane,master 27m v1.21.2+vmware.1
tkg-wld-md-0-847b7d8cf9-26qff Ready <none> 30m v1.21.2+vmware.1
tkg-wld-md-0-847b7d8cf9-pksnq Ready <none> 28m v1.21.2+vmware.1
tkg-wld-md-1-85cc664768-llqkt Ready <none> 27m v1.21.2+vmware.1
tkg-wld-md-1-85cc664768-zpzjr Ready <none> 36m v1.21.2+vmware.1
And you can see all three of the VMs in the vSphere Client:

And again, you can see that the nodes are spread out appropriately…the first node is on esx-01a.corp.tanzu which is in AZ1 and the second two hosts are on esx03a.corp.tanzu and esx-04a.corp.tanzu which are in AZ2.
Back at the command line where the tanzu cluster create
command was run, the following was the remainder of the output:
cluster control plane is still being initialized, retrying
Getting secret for cluster
Waiting for resource tkg-wld-kubeconfig of type *v1.Secret to be up and running
Waiting for cluster nodes to be available...
Waiting for resource tkg-wld of type *v1alpha3.Cluster to be up and running
Waiting for resources type *v1alpha3.MachineDeploymentList to be up and running
Waiting for resources type *v1alpha3.MachineList to be up and running
Waiting for addons installation...
Waiting for resources type *v1alpha3.ClusterResourceSetList to be up and running
Waiting for resource antrea-controller of type *v1.Deployment to be up and running
Waiting for packages to be up and running...
Waiting for package: antrea
Waiting for package: load-balancer-and-ingress-service
Waiting for package: metrics-server
Waiting for package: pinniped
Waiting for package: vsphere-cpi
Waiting for package: vsphere-csi
Waiting for resource vsphere-csi of type *v1alpha1.PackageInstall to be up and running
Waiting for resource vsphere-cpi of type *v1alpha1.PackageInstall to be up and running
Waiting for resource metrics-server of type *v1alpha1.PackageInstall to be up and running
Waiting for resource load-balancer-and-ingress-service of type *v1alpha1.PackageInstall to be up and running
Waiting for resource pinniped of type *v1alpha1.PackageInstall to be up and running
Waiting for resource antrea of type *v1alpha1.PackageInstall to be up and running
Successfully reconciled package: vsphere-csi
Successfully reconciled package: metrics-server
Successfully reconciled package: antrea
Successfully reconciled package: vsphere-cpi
Successfully reconciled package: pinniped
packageinstalls.packaging.carvel.dev "load-balancer-and-ingress-service" not found, retrying
waiting for 'load-balancer-and-ingress-service' Package to be installed, retrying
waiting for 'load-balancer-and-ingress-service' Package to be installed, retrying
waiting for 'load-balancer-and-ingress-service' Package to be installed, retrying
waiting for 'load-balancer-and-ingress-service' Package to be installed, retrying
Successfully reconciled package: load-balancer-and-ingress-service
Workload cluster 'tkg-wld' created
Inspect the cluster and validate node placement
You can use the tanzu cluster list
command to see a high-level view of the installed clusters:
tanzu cluster list --include-management-cluster
NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES ROLES PLAN
tkg-wld default running 3/3 4/4 v1.21.2+vmware.1 <none> prod
tkg-mgmt tkg-system running 1/1 1/1 v1.21.2+vmware.1 management dev
And the tanzu cluster get
command will give more details:
tanzu cluster get tkg-wld
NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES ROLES
tkg-wld default running 3/3 4/4 v1.21.2+vmware.1 <none>
â
¹
Details:
NAME READY SEVERITY REASON SINCE MESSAGE
/tkg-wld True 6m43s
ââClusterInfrastructure - VSphereCluster/tkg-wld True 52m
ââControlPlane - KubeadmControlPlane/tkg-wld-control-plane True 6m44s
â ââ3 Machines... True 48m See tkg-wld-control-plane-cxh9l, tkg-wld-control-plane-lhw6j, ...
ââWorkers
ââMachineDeployment/tkg-wld-md-0
â ââ2 Machines... True 36m See tkg-wld-md-0-847b7d8cf9-26qff, tkg-wld-md-0-847b7d8cf9-pksnq
ââMachineDeployment/tkg-wld-md-1
ââ2 Machines... True 41m See tkg-wld-md-1-85cc664768-llqkt, tkg-wld-md-1-85cc664768-zpzjr
Inspecting the configuration of the objects that were configured to be in specific availability zones is not terribly difficult:
kubectl get machinedeployment -o=custom-columns=NAME:.metadata.name,FAILUREDOMAIN:.spec.template.spec.failureDomain
NAME FAILUREDOMAIN
tkg-wld-md-0 az1
tkg-wld-md-1 az2
kubectl get machine -o=custom-columns=NAME:.metadata.name,FAILUREDOMAIN:.spec.failureDomain
NAME FAILUREDOMAIN
tkg-wld-control-plane-cxh9l az1
tkg-wld-control-plane-lhw6j az2
tkg-wld-control-plane-rb2nj az2
tkg-wld-md-0-847b7d8cf9-26qff az1
tkg-wld-md-0-847b7d8cf9-pksnq az1
tkg-wld-md-1-85cc664768-llqkt az2
tkg-wld-md-1-85cc664768-zpzjr az2
The worker machines are owned by a Machineset which is turn owned by the MachineDeployment that was configured earlier (via the overlay file). The control plane VMs were not configured but still ended up in a failureDomain. There is a controller running that checks whether the spec.controlPlane
value in the deployment zone definition is set to true and updates the control plane machines to set their spec.failureDomain
value appropriately. You can also check at the cluster level to validate that the different availability zones are configured for the control plane nodes:
kubectl get cluster tkg-wld -o json | jq .status.failureDomains
{
"az1": {
"controlPlane": true
},
"az2": {
"controlPlane": true
}
}
After tracing the ownership back to the cluster itself, you can see that both failure domains are enabled for the control plane nodes at the cluster level.
kubectl get vspheremachinetemplate -o=custom-columns=NAME:.metadata.name,FAILUREDOMAIN:.spec.template.spec.failureDomain
NAME FAILUREDOMAIN
tkg-wld-control-plane <none>
tkg-wld-worker-0 az1
tkg-wld-worker-1 az2
kubectl get vspheremachine -o=custom-columns=NAME:.metadata.name,FAILUREDOMAIN:.spec.failureDomain
NAME FAILUREDOMAIN
tkg-wld-control-plane-ck4rm <none>
tkg-wld-control-plane-fthps <none>
tkg-wld-control-plane-xv9dh <none>
tkg-wld-worker-0-265bz az1
tkg-wld-worker-0-rp2mw az1
tkg-wld-worker-1-7dmbv az2
tkg-wld-worker-1-8g66l az2
It is expected that the VSphereMachine and VSphereMachineTemplate objects only show failure domain values for worker objects since we only configured worker objects in the overlay file. As noted earlier, the control plane machine objects are managed by the KubeadmControlPlane (and subsequently the cluster) object.
You can also see that the nodes have been spread across appropriate VM groups in the vSphere Client:


Chris,
Excellent post, I think your ytt is wrong,
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
name: #@ “{}-md-1”.format(data.values.CLUSTER_NAME)
version: #@ data.values.KUBERNETES_VERSION
Should in fact be
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
name: #@ “{}-worker-1”.format(data.values.CLUSTER_NAME)
version: #@ data.values.KUBERNETES_VERSION
I made this change and my cluster deployed, with -md-1, any nodes in the second zone do not deploy as there is no vspheremachinetemplate with name …..-md-1.
Thank you so much for catching this! I had pasted in a not-quite-complete version of the overlay file before I had addressed that naming issue in the final version. I have updated the post to reflect the correct name of the second vspheremachinetemplate in the second machinedeployment spec section.
Very helpful post.