How to deploy a TKG 1.2 cluster on Microsoft Azure

It was a bit of a learning curve dealing with Azure since I’d never laid eyes on it before, but the end result was relatively easily achieved…a fully functional TKG cluster. We’ll walk through both installation methods… the UI and the CLI. There are a few things that need to be done or collected first at the Azure end of this which will be common to both methods.

Configure/Document Azure resources

When you log in to Azure, you can Azure Active Directory and you will be presented with a page similar to the following:

Make a note of the Tenant ID value as it will be used later.

While on the Active Directory page, click on the App registrations link on the right and then click on the New registration button at the top.

Give the new registration a meaningful name and set the access permissions appropriately.

Click the Register button when your ready to proceed. You should be presented with a page similar to the following:

Make a note of the Application (client) ID value as it will be used later.

Navigate to Subscriptions. If you have more than one, choose which one you’ll use for deploying TKG clusters and make a note of its Subscription ID value as it will be used later.

Click on the chosen Subscription and then click on the Access control (IAM) link on the left.

Click the Add button under Add a role assignment.

A new pane will appear on the right where you can configure the new role assignment. Set the Role to Owner, set the Assign access to value to Azure AD user, group, or service principal and then type in the name of your new application (clittle-tkg in my example) in the Select field. Your application should show up in the search results and clicking on it should move it into the Selected members list.

Click the Save button.

You can click on the Role assignments tab to validate that your application is listed as an owner.

Head back to Azure Active DirectoryApp registrations and then click on your application.

Click on the Certificates & Secrets link on the left.

Click on the New client secret button.

Enter a descriptive name and set the expiration as appropriate….don’t follow this example in a production environment as you would never want your certificate to have an indefinite expiration date.

Click the Add button.

Make a note of the value for your new secret as this is the only chance you’ll get to do so. Once you log out and back in or refresh your browser the secrete value will be obfuscated.

With all of this done, we now have the four parameters that will be needed during cluster creation:

Tenant ID: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0
Client ID: a48db493-6b9f-4709-a73d-b04efa9cb05a
Client Secret: xC2Ggr214o5e~mVcUX1OUW9WLZap_-0N
Subscription ID: 477a3190-70ed-47b1-a714-4ac99deb3f32

Prepare for cluster creation

You’ll need to download the tkg binary. You can find the latest tkg CLI download at <link> There are several other utilities included with the cli tgz file and you should go ahead and extract them as well.

You’ll also want to create an ssh key pair if you don’t already have one. This will be used for accessing the Azure virtual machines that are created. You can do this via a simple ssh-keygen command, similar to the following:

ssh-keygen -t rsa -b 4096 -C "admin@corp.tanzu"

Enter a location to save the key pair or accept the default when prompted. You can enter a passphrase to protect the key pair or leave it blank.

Generating public/private rsa key pair.
Enter file in which to save the key (/home/clittle/.ssh/id_rsa): /tmp/id_rsa
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /tmp/id_rsa.
Your public key has been saved in /tmp/id_rsa.pub.
The key fingerprint is:
SHA256:bhBWzRxqaa4Z6Dm3WwjW906e78GIak/yryI7GfVSLxI admin@corp.tanzu
The key's randomart image is:
+---[RSA 4096]----+
|        .+..     |
|       . o+      |
|      o =        |
|     +E=.        |
|    +.++S.       |
|   o.ooOoo.o     |
|    +o*o*.+ o    |
|    +ooO + . .   |
|    .=+++o=oo    |
+----[SHA256]-----+

Be sure to copy the contents of the .pub file created as it will be used later. In this example, it is /tmp/id_rsa.pub.

Provision your management cluster via the UI

Now we’re ready to kick things off with the tkg init --ui command, which will create the management cluster. You should see output similar to the following as well as a browser window being opened.

Logs of the command execution can also be found at: /tmp/tkg-20200914T115109677021347.log

Validating the pre-requisites...
Serving kickstart UI at http://127.0.0.1:8080

It’s worth noting that you can also use the --bind switch to specify a different address from which to launch the browser and the --browser switch to specify which browser to use. You can get more information on these switches by running tkg init --help.

In your browser, you should have a new tab open that looks similar to the following:

Click the Deploy button under Microsoft Azure and then fill in the details for the deployment per the information that was collected previously. If you have an existing resource group on Azure that you’d like to use it can be selected from the dropdown or you can let the installer create a new one for you. The choice of region is up to you.

Click the Next button.

If you have an existing VNET you’d like to use, select the appropriate radio button and then choose it from the dropdown. You’ll also need to choose appropriate control plane and worker node subnets. For this example, we’re letting the installer create a new VNET. Select the appropriate Resource Group from the dropdown and set the VNET CIDR Block as appropriate (or leave it at the default value like I did).

Click the Next button.

You have a number of choices to make on the Management Cluster Settings page.

  • Development vs. Production: This will dictate the number of control plane nodes deployed (1 vs 3).
  • Instance Type: This will dictate the compute and storage characteristics of the control plane virtual machines deployed to Azure. You can read more about the different size options at Sizes for virtual machines in Azure.
  • Worker Node Instance Type: This will dictate the compute and storage characteristics of the worker virtual machines deployed to Azure.
  • Machine Health Checks: This will dictate whether ClusterAPI will monitor the health of the deployed virtual machines and recreate them if they are deemed unhealthy. You can read more about machine health checks at Configure a MachineHealthCheck.

You can see in this example that I’ve selected a Development deployment with the smallest node size available and have named the management cluster mgmt-azure (if you leave this blank it will auto-generate a cluster name). I’ve also left the Machine Health Checks option enabled.

Click the Next button.

The Metadata page is entirely optional so complete it as you feel is appropriate.

This is a great new feature in the 1.2 version…the ability choose your CNI. Prior to this version, we were locked into Calico but you can now choose Antrea, Calico or None. If you choose None you would need to manually deploy a CNI to your cluster later. You can alter the default CIDR values or leave them at the defaults (as I did).

Click the Next button.

Choose to participate in the CEIP or not.

Click the Next button.

Click the Review Configuration button.

If everything looks good, click the Deploy Management Cluster button.

You’ll be able to follow the high-level progress in the UI.

And if you head back to Azure and navigate to the Resource groups page, you’ll see that the Resource Group that you specified has already been created (assuming you didn’t choose to use an existing one).

As with deploying a TKG cluster to other IAASs, you should see a bootstrap cluster created locally where you can follow the progress in more detail. Finding it should be as simple as running docker ps.

CONTAINER ID        IMAGE                                                             COMMAND                  CREATED             STATUS              PORTS                       NAMES
43979bd1e7c1        projects-stg.registry.vmware.com/tkg/kind/node:v1.19.0_vmware.1   "/usr/local/bin/entr…"   30 seconds ago      Up 15 seconds       127.0.0.1:62095->6443/tcp   tkg-kind-btbu80jnov8m8h72efvg-control-plane

You could docker exec into this container and look around but there is a better way of getting at what is happening in the bootstrap cluster. A kubeconfig file is created under your home directory at .kube-tkg\tmp and should be named similar to config_5nh1LwxL. You can use this to see what has been deployed to the bootstrap cluster and check the logs for various components. 

kubectl --kubeconfig=config_5nh1LwxL get nodes

NAME                                          STATUS   ROLES    AGE   VERSION
tkg-kind-btbu80jnov8m8h72efvg-control-plane   Ready    master   83s   v1.19.0+vmware.1
kubectl --kubeconfig=config_5nh1LwxL get po -A

NAMESPACE                           NAME                                                                  READY   STATUS    RESTARTS   AGE
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-54cf965957-rrlpz            2/2     Running   0          2m39s
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-5dc895c778-f7d95        2/2     Running   0          2m38s
capi-system                         capi-controller-manager-858d56cc8f-6s5tq                              2/2     Running   0          2m40s
capi-webhook-system                 capi-controller-manager-659679fd44-l9wx6                              2/2     Running   0          2m40s
capi-webhook-system                 capi-kubeadm-bootstrap-controller-manager-c96f4b9d-zbgjt              2/2     Running   0          2m39s
capi-webhook-system                 capi-kubeadm-control-plane-controller-manager-759c99fd9f-cx7zh        2/2     Running   0          2m38s
capi-webhook-system                 capz-controller-manager-575c4d8b5b-tqnsk                              2/2     Running   0          2m36s
capz-system                         capz-controller-manager-664b574684-hztn9                              2/2     Running   0          2m35s
cert-manager                        cert-manager-b98b948d8-xgfdn                                          1/1     Running   0          3m44s
cert-manager                        cert-manager-cainjector-577b45fb7c-gn5kl                              1/1     Running   0          3m45s
cert-manager                        cert-manager-webhook-55c5cd4dcb-zwgmb                                 1/1     Running   0          3m44s
kube-system                         coredns-774fbc4754-8kv5g                                              1/1     Running   0          4m19s
kube-system                         coredns-774fbc4754-gx7jp                                              1/1     Running   0          4m19s
kube-system                         etcd-tkg-kind-btbu80jnov8m8h72efvg-control-plane                      1/1     Running   0          4m34s
kube-system                         kindnet-mzt2n                                                         1/1     Running   0          4m19s
kube-system                         kube-apiserver-tkg-kind-btbu80jnov8m8h72efvg-control-plane            1/1     Running   0          4m34s
kube-system                         kube-controller-manager-tkg-kind-btbu80jnov8m8h72efvg-control-plane   1/1     Running   0          4m34s
kube-system                         kube-proxy-qs6lq                                                      1/1     Running   0          4m19s
kube-system                         kube-scheduler-tkg-kind-btbu80jnov8m8h72efvg-control-plane            1/1     Running   0          4m34s
local-path-storage                  local-path-provisioner-8b46957d4-c97ch                                1/1     Running   0          4m19s

You should see more activity in the UI as the deployment progresses, especially as the bootstrap cluster is being instantiated.

If you want to follow along as the bootstrap cluster start to provision resources on Azure, you can tail the log from the capi-controller-manager-<#######> pod in the capi-system namespace and the logs from the capi-controller-manager-<#######> pod in the capz-system namespace.

kubectl --kubeconfig=config_5nh1LwxL -n capi-system logs capi-controller-manager-858d56cc8f-6s5tq manager -f

Name":"","bootstrap":{},"infrastructureRef":{}},"status":{"bootstrapReady":false,"infrastructureReady":false}}}
I0914 19:18:30.629218       1 controller.go:159] controller-runtime/controller "msg"="Starting Controller" "controller"="machinehealthcheck"
I0914 19:18:30.629276       1 controller.go:152] controller-runtime/controller "msg"="Starting EventSource" "controller"="machinedeployment" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{"clusterName":"","selector":{},"template":{"metadata":{},"spec":{"clusterName":"","bootstrap":{},"infrastructureRef":{}}}},"status":{}}}
I0914 19:18:30.629220       1 controller.go:152] controller-runtime/controller "msg"="Starting EventSource" "controller"="machineset" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{"controlPlaneEndpoint":{"host":"","port":0}},"status":{"infrastructureReady":false,"controlPlaneInitialized":false}}}
I0914 19:18:30.629302       1 controller.go:152] controller-runtime/controller "msg"="Starting EventSource" "controller"="machinedeployment" "source"={"Type":{"metadata":{"creationTimestamp":null},"spec":{"controlPlaneEndpoint":{"host":"","port":0}},"status":{"infrastructureReady":false,"controlPlaneInitialized":false}}}
I0914 19:18:30.629325       1 controller.go:159] controller-runtime/controller "msg"="Starting Controller" "controller"="machineset"
I0914 19:18:30.629348       1 controller.go:159] controller-runtime/controller "msg"="Starting Controller" "controller"="machinedeployment"
I0914 19:18:30.729478       1 controller.go:152] controller-runtime/controller "msg"="Starting EventSource" "controller"="clusterresourceset" "source"={"Type":{"metadata":{"creationTimestamp":null}}}
I0914 19:18:30.729502       1 controller.go:180] controller-runtime/controller "msg"="Starting workers" "controller"="machineset" "worker count"=10
I0914 19:18:30.729532       1 controller.go:180] controller-runtime/controller "msg"="Starting workers" "controller"="cluster" "worker count"=10
I0914 19:18:30.729540       1 controller.go:180] controller-runtime/controller "msg"="Starting workers" "controller"="machinehealthcheck" "worker count"=10
I0914 19:18:30.729557       1 controller.go:180] controller-runtime/controller "msg"="Starting workers" "controller"="machinedeployment" "worker count"=10
I0914 19:18:30.830069       1 controller.go:159] controller-runtime/controller "msg"="Starting Controller" "controller"="clusterresourceset"
I0914 19:18:30.830139       1 controller.go:180] controller-runtime/controller "msg"="Starting workers" "controller"="clusterresourceset" "worker count"=10
I0914 19:19:03.289495       1 clusterresourceset_controller.go:247] controllers/ClusterResourceSet "msg"="Applying ClusterResourceSet to cluster" "cluster-name"="tkg-mgmt-azure-20200914130906" "clusterresourceset"="tkg-mgmt-azure-20200914130906-cni-antrea" "namespace"="tkg-system"
kubectl --kubeconfig=config_hetJb3TB -n capz-system logs capz-controller-manager-664b574684-mbntx manager

I0914 23:03:11.020616       1 azuremachine_controller.go:219] controllers/AzureMachine "msg"="Reconciling AzureMachine" "AzureCluster"="tkg-mgmt-azure-20200914165745" "azureMachine"="tkg-mgmt-azure-20200914165745-control-plane-rmdzf" "cluster"="tkg-mgmt-azure-20200914165745" "machine"="tkg-mgmt-azure-20200914165745-control-plane-zkv5h" "namespace"="tkg-system"
I0914 23:03:16.551570       1 azurecluster_controller.go:154] controllers/AzureCluster "msg"="Reconciling AzureCluster" "AzureCluster"="tkg-mgmt-azure-20200914165745" "cluster"="tkg-mgmt-azure-20200914165745" "namespace"="tkg-system"
I0914 23:03:16.552852       1 azuremachine_controller.go:219] controllers/AzureMachine "msg"="Reconciling AzureMachine" "AzureCluster"="tkg-mgmt-azure-20200914165745" "azureMachine"="tkg-mgmt-azure-20200914165745-md-0-h2qws" "cluster"="tkg-mgmt-azure-20200914165745" "machine"="tkg-mgmt-azure-20200914165745-md-0-586b965c45-67n7g" "namespace"="tkg-system"
I0914 23:03:16.553459       1 azuremachine_controller.go:241] controllers/AzureMachine "msg"="Bootstrap data secret reference is not yet available" "AzureCluster"="tkg-mgmt-azure-20200914165745" "azureMachine"="tkg-mgmt-azure-20200914165745-md-0-h2qws" "cluster"="tkg-mgmt-azure-20200914165745" "machine"="tkg-mgmt-azure-20200914165745-md-0-586b965c45-67n7g" "namespace"="tkg-system"
I0914 23:03:24.096700       1 azurecluster_controller.go:154] controllers/AzureCluster "msg"="Reconciling AzureCluster" "AzureCluster"="tkg-mgmt-azure-20200914165745" "cluster"="tkg-mgmt-azure-20200914165745" "namespace"="tkg-system"
I0914 23:05:02.536764       1 helpers.go:298] controllers/AzureJSONMachine "msg"="returning early from json reconcile, no update needed" "AzureMachine"="tkg-mgmt-azure-20200914165745-control-plane-rmdzf" "cluster"="tkg-mgmt-azure-20200914165745" "namespace"="tkg-system"
I0914 23:05:02.561623       1 azuremachine_controller.go:219] controllers/AzureMachine "msg"="Reconciling AzureMachine" "AzureCluster"="tkg-mgmt-azure-20200914165745" "azureMachine"="tkg-mgmt-azure-20200914165745-control-plane-rmdzf" "cluster"="tkg-mgmt-azure-20200914165745" "machine"="tkg-mgmt-azure-20200914165745-control-plane-zkv5h" "namespace"="tkg-system"
I0914 23:05:02.561754       1 helpers.go:298] controllers/AzureJSONMachine "msg"="returning early from json reconcile, no update needed" "AzureMachine"="tkg-mgmt-azure-20200914165745-control-plane-rmdzf" "cluster"="tkg-mgmt-azure-20200914165745" "namespace"="tkg-system"
I0914 23:05:04.177117       1 helpers.go:298] controllers/AzureJSONMachine "msg"="returning early from json reconcile, no update needed" "AzureMachine"="tkg-mgmt-azure-20200914165745-control-plane-rmdzf" "cluster"="tkg-mgmt-azure-20200914165745" "namespace"="tkg-system"
E0914 23:05:04.207322       1 controller.go:248] controller-runtime/controller "msg"="Reconciler error" "error"="error patching conditions: The condition \"Ready\" was modified by a different process and this caused a merge/ChangeCondition conflict:   \u0026v1alpha3.Condition{\n  \tType:               \"Ready\",\n- \tStatus:             \"Unknown\",\n+ \tStatus:             \"True\",\n  \tSeverity:           \"\",\n- \tLastTransitionTime: v1.Time{Time: s\"2020-09-14 23:05:02 +0000 UTC\"},\n+ \tLastTransitionTime: v1.Time{Time: s\"2020-09-14 23:05:04 +0000 UTC\"},\n  \tReason:             \"\",\n  \tMessage:            \"\",\n  }\n" "controller"="azuremachine" "name"="tkg-mgmt-azure-20200914165745-control-plane-rmdzf" "namespace"="tkg-system"
I0914 23:05:04.207539       1 helpers.go:298] controllers/AzureJSONMachine "msg"="returning early from json reconcile, no update needed" "AzureMachine"="tkg-mgmt-azure-20200914165745-control-plane-rmdzf" "cluster"="tkg-mgmt-azure-20200914165745" "namespace"="tkg-system"
I0914 23:05:04.208107       1 azuremachine_controller.go:219] controllers/AzureMachine "msg"="Reconciling AzureMachine" "AzureCluster"="tkg-mgmt-azure-20200914165745" "azureMachine"="tkg-mgmt-azure-20200914165745-control-plane-rmdzf" "cluster"="tkg-mgmt-azure-20200914165745" "machine"="tkg-mgmt-azure-20200914165745-control-plane-zkv5h" "namespace"="tkg-system"
I0914 23:05:05.623932       1 azuremachine_controller.go:219] controllers/AzureMachine "msg"="Reconciling AzureMachine" "AzureCluster"="tkg-mgmt-azure-20200914165745" "azureMachine"="tkg-mgmt-azure-20200914165745-control-plane-rmdzf" "cluster"="tkg-mgmt-azure-20200914165745" "machine"="tkg-mgmt-azure-20200914165745-control-plane-zkv5h" "namespace"="tkg-system"
I0914 23:05:05.626170       1 helpers.go:298] controllers/AzureJSONMachine "msg"="returning early from json reconcile, no update needed" "AzureMachine"="tkg-mgmt-azure-20200914165745-control-plane-rmdzf" "cluster"="tkg-mgmt-azure-20200914165745" "namespace"="tkg-system"

When the deployment is finished, you should be presented with a screen similar to the following in the UI:

And back in Azure you should see numerous objects created and running now.

If you go back to the command line where you issued the tkg init --ui command,  you should see output similar to the following:

Logs of the command execution can also be found at: C:\Users\clittle\AppData\Local\Temp\tkg-20200908T135007539199435.log

Validating the pre-requisites...
Serving kickstart UI at http://127.0.0.1:8080
Validating configuration...
web socket connection established
sending pending 2 logs to UI
Using infrastructure provider azure:v0.4.8
Generating cluster configuration...
Setting up bootstrapper...
Bootstrapper created. Kubeconfig: C:\Users\clittle\.kube-tkg\tmp\config_5nh1LwxL
Installing providers on bootstrapper...
Fetching providers
Installing cert-manager Version="v0.16.1"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.9" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.9" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.9" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-azure" Version="v0.4.8" TargetNamespace="capz-system"
Start creating management cluster...
Saving management cluster kuebconfig into C:\Users\clittle/.kube/config
Installing providers on management cluster...
Fetching providers
Installing cert-manager Version="v0.16.1"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.9" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.9" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.9" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-azure" Version="v0.4.8" TargetNamespace="capz-system"
Waiting for the management cluster to get ready for move...
Waiting for addons installation...
Moving all Cluster API objects from bootstrap cluster to management cluster...
Performing move...
Discovering Cluster API objects
Moving Cluster API objects Clusters=1
Creating objects in the target cluster
Deleting objects from the source cluster
Context set for management cluster tkg-mgmt-azure-20200908140522 as 'tkg-mgmt-azure-20200908140522-admin@tkg-mgmt-azure-20200908140522'.

Management cluster created!


You can now create your first workload cluster by running the following:

  tkg create cluster [name] --kubernetes-version=[version] --plan=[plan]

At this point you can jump down to the section on creating a workload cluster if you have no intention of deploying a management cluster via the CLI.

Provision your management cluster via the CLI

The first thing to do when creating the management cluster via the CLI is to run the tkg get mc command. This will build out the .tkg folder structure that is required for all subsequent steps.

With this done, open the .tkg/config.yaml file in a text editor. You should see something similar to the following:

cert-manager-timeout: 30m0s
overridesFolder: /home/clittle/.tkg/overrides
NODE_STARTUP_TIMEOUT: 20m
BASTION_HOST_ENABLED: "true"
providers:
  - name: cluster-api
    url: /home/clittle/.tkg/providers/cluster-api/v0.3.9/core-components.yaml
    type: CoreProvider
  - name: aws
    url: /home/clittle/.tkg/providers/infrastructure-aws/v0.5.5/infrastructure-components.yaml
    type: InfrastructureProvider
  - name: vsphere
    url: /home/clittle/.tkg/providers/infrastructure-vsphere/v0.7.1/infrastructure-components.yaml
    type: InfrastructureProvider
  - name: azure
    url: /home/clittle/.tkg/providers/infrastructure-azure/v0.4.8/infrastructure-components.yaml
    type: InfrastructureProvider
  - name: tkg-service-vsphere
    url: /home/clittle/.tkg/providers/infrastructure-tkg-service-vsphere/v1.0.0/unused.yaml
    type: InfrastructureProvider
  - name: kubeadm
    url: /home/clittle/.tkg/providers/bootstrap-kubeadm/v0.3.9/bootstrap-components.yaml
    type: BootstrapProvider
  - name: kubeadm
    url: /home/clittle/.tkg/providers/control-plane-kubeadm/v0.3.9/control-plane-components.yaml
    type: ControlPlaneProvider
  - name: docker
    url: /home/clittle/.tkg/providers/infrastructure-docker/v0.3.6/infrastructure-components.yaml
    type: InfrastructureProvider
images:
    all:
        repository: projects-stg.registry.vmware.com/tkg/cluster-api
    cert-manager:
        repository: projects-stg.registry.vmware.com/tkg/cert-manager
        tag: v0.16.1_vmware.1
release:
    version: v1.2.0-pre-alpha-292-gd80115e

There are a number of parameters that we’ll need to add in here to get it into the configuration we desire:

  • AZURE_TENANT_ID: The tenant ID that was noted earlier.
  • AZURE_CLIENT_ID: The Client ID that was noted earlier.
  • AZURE_CLIENT_SECRET: The Client Secret value that was noted earlier.
  • AZURE_SUBSCRIPTION_ID: The Subscription ID value that was noted earlier.
  • AZURE_LOCATION: The Availability Zone where you would like the management cluster deployed.
  • AZURE_SSH_PUBLIC_KEY_B64: The base64-encoded public key from the SSH key pair that was created earlier. You can run a command similar to base64 /tmp/id_rsa.pub to get this value.
  • AZURE_RESOURCE_GROUP: A new or existing resource group.
  • AZURE_VNET_NAME: A new or existing VNET. If existing, you also need to specify the AZURE_CONTROL_PLANE_SUBNET_NAME, AZURE_CONTROL_PLANE_SUBNET_CIDR, AZURE_NODE_SUBNET_NAME and AZURE_NODE_SUBNET_CIDR values.
  • AZURE_VNET_CIDR: VNET CIDR
  • AZURE_CONTROL_PLANE_MACHINE_TYPE: Control Plane VM type
  • AZURE_NODE_MACHINE_TYPE: Worker VM type
  • MACHINE_HEALTH_CHECK_ENABLED: Whether to monitor node health
  • CLUSTER_CIDR: Cluster POD CIDR
  • SERVICE_CIDR: Cluster Service CIDR

When the .tkg/config.yaml file is finished, it should look similar to the following:

cert-manager-timeout: 30m0s
overridesFolder: /home/clittle/.tkg/overrides
NODE_STARTUP_TIMEOUT: 20m
BASTION_HOST_ENABLED: "true"
providers:
  - name: cluster-api
    url: /home/clittle/.tkg/providers/cluster-api/v0.3.9/core-components.yaml
    type: CoreProvider
  - name: aws
    url: /home/clittle/.tkg/providers/infrastructure-aws/v0.5.5/infrastructure-components.yaml
    type: InfrastructureProvider
  - name: vsphere
    url: /home/clittle/.tkg/providers/infrastructure-vsphere/v0.7.1/infrastructure-components.yaml
    type: InfrastructureProvider
  - name: azure
    url: /home/clittle/.tkg/providers/infrastructure-azure/v0.4.8/infrastructure-components.yaml
    type: InfrastructureProvider
  - name: tkg-service-vsphere
    url: /home/clittle/.tkg/providers/infrastructure-tkg-service-vsphere/v1.0.0/unused.yaml
    type: InfrastructureProvider
  - name: kubeadm
    url: /home/clittle/.tkg/providers/bootstrap-kubeadm/v0.3.9/bootstrap-components.yaml
    type: BootstrapProvider
  - name: kubeadm
    url: /home/clittle/.tkg/providers/control-plane-kubeadm/v0.3.9/control-plane-components.yaml
    type: ControlPlaneProvider
  - name: docker
    url: /home/clittle/.tkg/providers/infrastructure-docker/v0.3.6/infrastructure-components.yaml
    type: InfrastructureProvider
images:
    all:
        repository: projects-stg.registry.vmware.com/tkg/cluster-api
    cert-manager:
        repository: projects-stg.registry.vmware.com/tkg/cert-manager
        tag: v0.16.1_vmware.1
release:
    version: v1.2.0-pre-alpha-292-gd80115e
AZURE_TENANT_ID: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0
AZURE_CLIENT_ID: a48db493-6b9f-4709-a73d-b04efa9cb05a
AZURE_CLIENT_SECRET: xC2Ggr214o5e~mVcUX1OUW9WLZap_-0N
AZURE_SUBSCRIPTION_ID: 477a3190-70ed-47b1-a714-4ac99deb3f32
AZURE_LOCATION: eastus
AZURE_SSH_PUBLIC_KEY_B64: c3NoLXJzYSBBQUFBQjNOemFDMXljMkVBQUFBREFRQUJBQUFDQVFEYnV2YnRlWWMvVlJVR1dIdFdIckdHbWdKWnBrUGdMbEhqbmg0ZXZRdkFWdDU5Q0NFTWNqdC84UDU2U1dhVEhWY2ZtUVBlSGlDTVNIbE5aRmt1bWR1VXpteUx0LzlDZ2ZxS1ViL3hVZDhFM0c1b091VkhWUi9QMy9EODdZL2ovNzg3V0x4MUhXbWxxOFlMN1RBbzE2SmdiQi9URjRFYkFYRWpORjVhaURVZHBGRzFHZ05oY1hmR1pkRU8zdFlUTERJSVloTWpIeFJndUc4YldpRWdvVmIwNkRWL0NvWUtqdU1XMzNTUFI4WmozOXRxNEdJV3hWV0VpdmFmNFZ5azRqZ2YrYW5sNkR1ZW1GT09FUUxoVTI0T3lYUTNjNHJ2c2xVaEhPL1V3VisxSTViK0s1K1ltRzJHZDI5Si8xMXNVSXB0bDlnMDdRUG95OUFxb0JWTFRSdVQ0N0NPRnp0QVFMeHBxWkVuUUJoNnFDTUxRQU9icCtHNWlxMk5XTXEvczBFWDZFUEEyNUZYT2UvTk9wRzRSMGRKZXE1TDRMcTN2N091SGlZV3ZNNTdxWFh0N21yVHlRSmxjaGNPL21VcWNjL3FzOXRnUUtxbFBUVHhuOE1GTHpNdHNxMWVPdmdBemNzZEtySTVaa1dVR1AvNHJxa1J6RXNyeFF5NHBQSmVQRHpPWFBJRlJWdDJlVzQrenI1YTI1SkRZa0Zoalp6Y3Byc1VaZmZQREtrdEpsczBZcDRzdXFVc05YRTJDUWtOaU4veEo0dGExUTVOQWN0YnpuRHYzS0J3MlIrR2VWQU9uMi83anV3ako5R2dWdWdYYXFDVWkxVnRBTDI5L0w5a1Q5N3lCTEE5QkdTY3ZvU0taY29tbTU5ZVAyUHhkWXlkNmIvUjZuUGc2aEhvalE9PSBhZG1pbkBjb3JwLnRhbnp1
AZURE_RESOURCE_GROUP: cjl-tkg
AZURE_VNET_NAME: cjl-tkg
AZURE_VNET_CIDR: 10.0.0.0/8
AZURE_CONTROL_PLANE_MACHINE_TYPE: Standard_D2s_v3
AZURE_NODE_MACHINE_TYPE: Standard_D2s_v3
MACHINE_HEALTH_CHECK_ENABLED: "true"
CLUSTER_CIDR: 100.96.0.0/11
SERVICE_CIDR: 100.64.0.0/13

You may want to increase the cert-manager-timeout and/or the NODE_STARTUP_TIMEOUT values if you suspect that the deployment may run long.

Now we’re ready to run our tkg init command. There are a few parameters that we’ll be passing in to get this management cluster configured similar to the one that was deployed via the UI:

  • -i azure : This will tell the installer that we’re deploying to Azure
  • -p dev : This selects the dev plan, which is a single control plane node and a single worker node.
  • --ceip-participation true : This enrolls our cluster in the Customer Experience Improvement Program.
  • --name mgmt-azure : This sets the name of the cluster to mgmt-azure.
  • -cni antrea : This deploy Antrea as the CNI.
  • -v 6 : This sets a high verbosity for the output of the tkg command.

Our final command looks like the following:

tkg init -i azure -p dev --ceip-participation true --name mgmt-azure --cni antrea -v 6

And you’ll see a lot of output at the command line as the bootstrap cluster is created (this output is heavily truncated)

Logs of the command execution can also be found at: /tmp/tkg-20200915T124357718805799.log
Using configuration file: /home/clittle/.tkg/config.yaml

Validating the pre-requisites...

Setting up management cluster...
Validating configuration...
Using infrastructure provider azure:v0.4.8
Generating cluster configuration...
Fetching File="cluster-template-definition-dev.yaml" Provider="infrastructure-azure" Version="v0.4.8"
Setting up bootstrapper...
Fetching configuration for kind node image...
Creating kind cluster: tkg-kind-btggmrs80laenk092gng
Ensuring node image (projects-stg.registry.vmware.com/tkg/kind/node:v1.19.1_vmware.1) ...
Image: projects-stg.registry.vmware.com/tkg/kind/node:v1.19.1_vmware.1 present locally
Preparing nodes ...
Writing configuration ...
Starting control-plane ...
Installing CNI ...
Installing StorageClass ...
Waiting 2m0s for control-plane = Ready ...
Ready after 26s
Bootstrapper created. Kubeconfig: /home/clittle/.kube-tkg/tmp/config_UHMrZXOd
Checking cluster reachability...
Installing providers on bootstrapper...
Installing the clusterctl inventory CRD
Creating CustomResourceDefinition="providers.clusterctl.cluster.x-k8s.io"
Fetching providers
Fetching File="core-components.yaml" Provider="cluster-api" Version="v0.3.9"
Fetching File="bootstrap-components.yaml" Provider="bootstrap-kubeadm" Version="v0.3.9"
Fetching File="control-plane-components.yaml" Provider="control-plane-kubeadm" Version="v0.3.9"
Fetching File="infrastructure-components.yaml" Provider="infrastructure-azure" Version="v0.4.8"
Fetching File="metadata.yaml" Provider="cluster-api" Version="v0.3.9"
Fetching File="metadata.yaml" Provider="bootstrap-kubeadm" Version="v0.3.9"
Fetching File="metadata.yaml" Provider="control-plane-kubeadm" Version="v0.3.9"
Fetching File="metadata.yaml" Provider="infrastructure-azure" Version="v0.4.8"
Creating Namespace="cert-manager-test"
Installing cert-manager Version="v0.16.1"
Creating Namespace="cert-manager"
Creating CustomResourceDefinition="certificaterequests.cert-manager.io"
Creating CustomResourceDefinition="certificates.cert-manager.io"

You’ll see the same kind of activity at Azure as was noted for the UI-base install and you’ll also be able to follow the logs in the bootstrap cluster by using the kubeconfig file under .kube-tkg/config/tmp and tailing the logs of the pods in the capi and capz namespaces.

When the process is finished, you should see output similar to the following:

Resuming the target cluster
Set Cluster.Spec.Paused Paused=false Cluster="mgmt-azure" Namespace="tkg-system"
Context set for management cluster mgmt-azure as 'mgmt-azure-admin@mgmt-azure'.
Deleting kind cluster: tkg-kind-btggmrs80laenk092gng

Management cluster created!


You can now create your first workload cluster by running the following:

  tkg create cluster [name] --kubernetes-version=[version] --plan=[plan]

Deploy a workload cluster

Per the message at the end of the tkg init output, you can now use the tkg create cluster command to create your first workload cluster on Azure. But first, we’ll want to run a few other commands to make sure we’re really ready.

The first thing we’ll do is validate that our management cluster is really up and accessible.

tkg get mc

MANAGEMENT-CLUSTER-NAME          CONTEXT-NAME                                                       STATUS
tkg-mgmt-azure-20200908140522 *  tkg-mgmt-azure-20200908140522-admin@tkg-mgmt-azure-20200908140522  Success

The next thing is to make sure that are context is created and set correctly. Part of the process of creating the management cluster is to create a context for it as well but we will still need to set it as the current one.

kubectl config get-contexts

CURRENT   NAME                                                                CLUSTER                         AUTHINFO                              NAMESPACE
          tkg-mgmt-azure-20200908140522-admin@tkg-mgmt-azure-20200908140522   tkg-mgmt-azure-20200908140522   tkg-mgmt-azure-20200908140522-admin
kubectl config use-context tkg-mgmt-azure-20200908140522-admin@tkg-mgmt-azure-20200908140522

Switched to context "tkg-mgmt-azure-20200908140522-admin@tkg-mgmt-azure-20200908140522".

And we can now validate the configuration of our management cluster.

kubectl get nodes -o wide

NAME                                                STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
tkg-mgmt-azure-20200908140522-control-plane-6lntl   Ready    master   21m   v1.19.0   10.0.0.4      <none>        Ubuntu 18.04.5 LTS   5.4.0-1023-azure   containerd://1.3.4
tkg-mgmt-azure-20200908140522-md-0-l4rkz            Ready    <none>   18m   v1.19.0   10.1.0.4      <none>        Ubuntu 18.04.5 LTS   5.4.0-1023-azure   containerd://1.3.4
kubectl get po -A

NAMESPACE                           NAME                                                                        READY   STATUS    RESTARTS   AGE
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-54cf965957-nnrbf                  2/2     Running   0          12m
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-5dc895c778-kr84f              2/2     Running   0          12m
capi-system                         capi-controller-manager-858d56cc8f-gfkmp                                    2/2     Running   0          12m
capi-webhook-system                 capi-controller-manager-659679fd44-lqnpv                                    2/2     Running   0          12m
capi-webhook-system                 capi-kubeadm-bootstrap-controller-manager-c96f4b9d-9zlvs                    2/2     Running   0          12m
capi-webhook-system                 capi-kubeadm-control-plane-controller-manager-759c99fd9f-9fjnq              2/2     Running   0          12m
capi-webhook-system                 capz-controller-manager-575c4d8b5b-tj9t4                                    2/2     Running   0          12m
capz-system                         capz-controller-manager-664b574684-kx4fs                                    2/2     Running   0          12m
cert-manager                        cert-manager-b98b948d8-8l4kx                                                1/1     Running   0          20m
cert-manager                        cert-manager-cainjector-577b45fb7c-vmncm                                    1/1     Running   0          20m
cert-manager                        cert-manager-webhook-55c5cd4dcb-pmxdh                                       1/1     Running   0          20m
kube-system                         antrea-agent-pdcfx                                                          2/2     Running   0          21m
kube-system                         antrea-agent-zmskh                                                          2/2     Running   2          19m
kube-system                         antrea-controller-d5b5cd9f8-4s9kv                                           1/1     Running   0          21m
kube-system                         coredns-85fc8659b-mf6tn                                                     1/1     Running   0          21m
kube-system                         coredns-85fc8659b-rwgcf                                                     1/1     Running   0          21m
kube-system                         etcd-tkg-mgmt-azure-20200908140522-control-plane-6lntl                      1/1     Running   0          21m
kube-system                         kube-apiserver-tkg-mgmt-azure-20200908140522-control-plane-6lntl            1/1     Running   0          21m
kube-system                         kube-controller-manager-tkg-mgmt-azure-20200908140522-control-plane-6lntl   1/1     Running   0          21m
kube-system                         kube-proxy-hwd7b                                                            1/1     Running   0          21m
kube-system                         kube-proxy-lnfdf                                                            1/1     Running   0          19m
kube-system                         kube-scheduler-tkg-mgmt-azure-20200908140522-control-plane-6lntl            1/1     Running   0          21m

Everything is looking good so we can proceed with creating a workload cluster.

If you want your workload cluster nodes to end up in the same Azure Resource Group and use the same sizing as your management cluster, you can proceed straight to running your tkg create cluster command. Otherwise, you’ll want to open the .tkg/config.yaml file and make some changes. Specific parameters to pay attention to are:

  • AZURE_RESOURCE_GROUP
  • AZURE_VNET_NAME
  • AZURE_VNET_CIDR
  • AZURE_CONTROL_PLANE_MACHINE_TYPE
  • AZURE_NODE_MACHINE_TYPE
  • MACHINE_HEALTH_CHECK_ENABLED

If the  management cluster was deployed via the UI, you won’t see the AZURE_RESOURCE_GROUP , AZURE_VNET_NAME and AZURE_VNET_CIDR parameters and the default will be to create a new ones with the same name as the workload cluster.

When you’re ready, you can run a command similar to the following to kick off the deployment:

tkg create cluster azure-wld -p dev

As you can see, I’m taking a very simple approach to how I want this cluster created. I’m naming it azure-wld and using an unaltered dev plan (1 control plane and one worker node). You can use lots of other parameters to fine-tune the cluster, similar to what was done with the tkg init command. You can run tkg create cluster --help to see all of the available options.

The output from this command is very sparse by default.

Logs of the command execution can also be found at: C:\Users\clittle\AppData\Local\Temp\tkg-20200908T144203017991719.log
Validating configuration...
Creating workload cluster 'azure-wld'...
Waiting for cluster to be initialized...
Waiting for cluster nodes to be available...
Waiting for addons installation...

Workload cluster 'azure-wld' created

You can pass the -v # switch to the tkg create cluster command to increase the output verbosity, where # is a number…higher is more verbose.

When the deployment is done, you can run the tkg get clusters command to see that your cluster is created successfully.

tkg get clusters

NAME       NAMESPACE  STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES
azure-wld  default    running  1/1           1/1      v1.19.0+vmware.1  workload

And back in Azure you should see a new Resource Group created whose name matches the name of your workload cluster. There should be several items in it (very similar to how the Resource Group for the management cluster looks).

We can now use the tkg get credentials command to populate our kubeconfig file with the context information for the new workload cluster.


tkg get credentials azure-wld

Credentials of workload cluster 'azure-wld' have been saved
You can now access the cluster by running 'kubectl config use-context azure-wld-admin@azure-wld'
kubectl config get-contexts
CURRENT   NAME                                                                CLUSTER                         AUTHINFO                              NAMESPACE
          azure-wld-admin@azure-wld                                           azure-wld                       azure-wld-admin
*         tkg-mgmt-azure-20200908140522-admin@tkg-mgmt-azure-20200908140522   tkg-mgmt-azure-20200908140522   tkg-mgmt-azure-20200908140522-admin

The context can be changed to the workload cluster context and we can then start using kubectl to manage the cluster.


kubectl config use-context azure-wld-admin@azure-wld

Switched to context
"azure-wld-admin@azure-wld".

kubectl get nodes

NAME                            STATUS   ROLES    AGE     VERSION
azure-wld-control-plane-6j9jd   Ready    master   7m19s   v1.19.0
azure-wld-md-0-fbpls            Ready    <none>   5m8s    v1.19.0

If you find that you need direct access to these nodes (or the nodes in the management cluster), you can use the ssh key pair to get to them. Unfortunately, the kubectl get nodes -o wide command does not show the external IP address:

NAME                            STATUS   ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
azure-wld-control-plane-mjl28   Ready    master   10m     v1.19.1   10.0.0.4      <none>        Ubuntu 18.04.5 LTS   5.4.0-1025-azure   containerd://1.3.4
azure-wld-md-0-pgqp5            Ready    <none>   8m24s   v1.19.1   10.1.0.4      <none>        Ubuntu 18.04.5 LTS   5.4.0-1025-azure   containerd://1.3.4

However, you can get the external IP address of each node in the Azure UI if you navigate to Resource Groups, click on the resource group that maps to your cluster, and then clicking on the appropriate virtual machine.

Then you can issue a command similar to the following to access the node:

ssh -i /tmp/id_rsa capi@52.226.47.63

The authenticity of host '52.226.47.63 (52.226.47.63)' can't be established.
ECDSA key fingerprint is SHA256:8kVqR/u8VhdT/YQPGT4HWllUIeDeXnt153MAhuBBcks.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '52.226.47.63' (ECDSA) to the list of known hosts.
Enter passphrase for key '/tmp/id_rsa':
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1025-azure x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Tue Sep 15 21:15:53 UTC 2020

  System load:  0.13               Processes:                 168
  Usage of /:   3.8% of 123.88GB   Users logged in:           0
  Memory usage: 15%                IP address for eth0:       10.0.0.4
  Swap usage:   0%                 IP address for antrea-gw0: 100.96.0.1

 * Kubernetes 1.19 is out! Get it in one command with:

     sudo snap install microk8s --channel=1.19 --classic

   https://microk8s.io/ has docs and details.

0 packages can be updated.
0 updates are security updates.



The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

capi@azure-wld-control-plane-qct2g:~$

Leave a Comment

Your email address will not be published. Required fields are marked *