I had been so focused on TKG on vSphere lately that I had been overlooking the fact that we’ve automated the process for standing up a cluster on AWS as well. While it’s far easier to do this via the TKG UI, I wanted to take a stab at it entirely from the command line. The process wasn’t at all bad but did require a fair amount of prep work up front to get things ready on AWS. I will say that after I went through this process once I’ve been able to spin up and tear down numerous clusters on AWS in a matter of minutes.
Note: These instructions assume you already have some level of access to AWS and to the tkg
CLI.
Table of Contents
Create an IAM User on AWS
You will need to have an IAM account/password on AWS that you use for creating resources needed by TKG. If you already have an IAM account with administrative access that you’d like to use you can skip this first part.
When you are logged in to AWS, you will find the IAM section under Security, Identity & Compliance:
Once you’re on the IAM page, you can click on the Users link and then on the Add user button to create a new IAM user. Enter a meaningful name and set the Access type to Programmatic access.

Click the Next: Permissions button.
You’ll need to assign permissions to your new user via group membership or by attaching policies directly. You might already have a group set up which you could use for this step. I’m choosing to grant my new user the AdministratorAccess policy.

Click Next: Tags and then click Next: Review. Validate that the account will be created the way you want and then click Create user.

On the last page, it’s very important that you click the Show link under Secret access key as it is the only time you will be able to retrieve this value. Save it somewhere secure.

Click the Close button.
To allow this user to login to the AWS console you will need to configure its credentials. Select your new user and then click on the Security credentials tab. Click the Manage link next to Console password.

Set Console access to Enable. You can either use an auto-generated password or specify your own. Click the Apply button.
Navigate to the main AWS page and then to EC2 > Network & Security > Key Pairs. Click the Create key pair button at the top right. Note: You can reuse an existing key pair if you have one and can skip this step.
Specify a name for the new key pair. The choice of file format is dependent on how you might want to access the bastion host.

Click the Create key pair button.
Prepare for cluster creation
You’ll need to download the tkg
CLI as well as the aws
CLI, clusterawsadm
and jq
utilities. You can find the latest tkg
CLI and cclusterawsadm
downloads at Download VMware Tanzu Kubernetes Grid. There are numerous installation methods for installing jq
based on where you’re going to run your CLI commands. The instructions for installing the aws
CLI can be found at Installing the AWS CLI version 2.
Once your command line utilities are downloaded and in place, you can start with setting up some environment variables
export AWS_ACCESS_KEY_ID=<aws_access_key created earlier>
export AWS_SECRET_ACCESS_KEY=<aws_access_key_secret created earlier>
export AWS_SESSION_TOKEN=aws_session_token (only needed if you use multifactor authentication)
export AWS_REGION=us-east-1 (set this as appropriate)
Once these are set you can use the clusterawsadm
command to create a CloudFoundation stack on AWS.
clusterawsadm alpha bootstrap create-stack
Attempting to create CloudFormation stack cluster-api-provider-aws-sigs-k8s-io
While this is creating you should see a new CloudFormation stack being created on AWS.
When the stack is created, you’ll see output similar to the following on the command line:
Following resources are in the stack:
Resource |Type |Status
AWS::IAM::Group |bootstrapper.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::InstanceProfile |control-plane.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::InstanceProfile |controllers.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::InstanceProfile |nodes.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::ManagedPolicy |arn:aws:iam::537043370288:policy/control-plane.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::ManagedPolicy |arn:aws:iam::537043370288:policy/nodes.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::ManagedPolicy |arn:aws:iam::537043370288:policy/controllers.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::Role |control-plane.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::Role |controllers.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::Role |nodes.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::User |bootstrapper.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
And you’ll see the Status of your new stack as CREATE_COMPLETE at AWS.
Next up is setting more environment variables related to AWS which will be used by the tkg
CLI as it utilized Cluster API to create the cluster.
export AWS_CREDENTIALS=$(aws iam create-access-key --user-name bootstrapper.cluster-api-provider-aws.sigs.k8s.io --output json)
export AWS_ACCESS_KEY_ID=$(echo $AWS_CREDENTIALS | jq .AccessKey.AccessKeyId -r)
export AWS_SECRET_ACCESS_KEY=$(echo $AWS_CREDENTIALS | jq .AccessKey.SecretAccessKey -r)
export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm alpha bootstrap encode-aws-credentials)
Provision the management cluster
If you’ve never used TKG you’ll first need to run the tkg get mc
command to create a .tkg
folder and .tkg/config.yaml
file.
Open the .tkg/config.yaml
file in a text editor. Unless you’ve created other TKG clusters, it should look like the following:
cert-manager-timeout: 30m0s
overridesFolder: /home/ubuntu/.tkg/overrides
NODE_STARTUP_TIMEOUT: 20m
BASTION_HOST_ENABLED: "true"
providers:
- name: cluster-api
url: /home/ubuntu/.tkg/providers/cluster-api/v0.3.6/core-components.yaml
type: CoreProvider
- name: aws
url: /home/ubuntu/.tkg/providers/infrastructure-aws/v0.5.4/infrastructure-components.yaml
type: InfrastructureProvider
- name: vsphere
url: /home/ubuntu/.tkg/providers/infrastructure-vsphere/v0.6.6/infrastructure-components.yaml
type: InfrastructureProvider
- name: tkg-service-vsphere
url: /home/ubuntu/.tkg/providers/infrastructure-tkg-service-vsphere/v1.0.0/unused.yaml
type: InfrastructureProvider
- name: kubeadm
url: /home/ubuntu/.tkg/providers/bootstrap-kubeadm/v0.3.6/bootstrap-components.yaml
type: BootstrapProvider
- name: kubeadm
url: /home/ubuntu/.tkg/providers/control-plane-kubeadm/v0.3.6/control-plane-components.yaml
type: ControlPlaneProvider
images:
all:
repository: gcr.io/kubernetes-development-244305/cluster-api
cert-manager:
repository: gcr.io/kubernetes-development-244305/cert-manager
tag: v0.11.0_vmware.1
release:
version: v1.1.3
You’ll need to add a series of variables to this file to allow a TKG cluster to be created on AWS.
AWS_REGION:
AWS_NODE_AZ:
AWS_PRIVATE_NODE_CIDR:
AWS_PUBLIC_NODE_CIDR:
AWS_PUBLIC_SUBNET_ID:
AWS_PRIVATE_SUBNET_ID:
AWS_SSH_KEY_NAME:
AWS_VPC_ID:
AWS_VPC_CIDR:
BASTION_HOST_ENABLED:
CLUSTER_CIDR:
CONTROL_PLANE_MACHINE_TYPE:
NODE_MACHINE_TYPE:
You can find detailed information about each parameter at Deploy Management Clusters to Amazon EC2 with the CLI. The following is what mine looked like:
AWS_REGION: us-east-1
AWS_NODE_AZ: us-east-1a
AWS_PRIVATE_NODE_CIDR: 10.0.0.0/24
AWS_PUBLIC_NODE_CIDR: 10.0.1.0/24
AWS_PUBLIC_SUBNET_ID:
AWS_PRIVATE_SUBNET_ID:
AWS_SSH_KEY_NAME: cjlittle-tkg
AWS_VPC_ID:
AWS_VPC_CIDR: 10.0.0.0/16
#BASTION_HOST_ENABLED:
CLUSTER_CIDR: 100.96.0.0/11
CONTROL_PLANE_MACHINE_TYPE: m5.large
NODE_MACHINE_TYPE: m5.large
You’ll notice that several items are blank as they were optional or not relevant to my install (AWS_VPC_ID is only set if you’re re-using an existing VPC, see Requirements for Using an Existing VPC to Provision a Cluster for details). Also, you should review Amazon EC2 Instance Types for details on the different machine types. I also commented out the BASTION_HOST_ENABLED
line as it already existed and was set to true
.
Now we’re ready to kick things off with the tkg init
command, which will create the management cluster. There are loads of options you can specify to fine-tune your management cluster. In this example, we’ll specify a name for the cluster being created and specify the dev plan which will create a single control-plane, single worker node deployment.
tkg init --infrastructure aws --name cjlittle-mgmt-aws --plan dev
You should be able to see a kind (Kubernetes in Docker) container running in Docker. This is the bootstrap Kubernetes cluster and will start the process off of using Cluster API to provision the TKG cluster on AWS.
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
28648ceb690a gcr.io/kubernetes-development-244305/kind/node:v1.18.6_vmware.1 "/usr/local/bin/entr…" 4 minutes ago Up About a minute 127.0.0.1:39773->6443/tcp tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane
If you need to troubleshoot the early bootstrap stages, you can docker exec
into this container to review logs. You can also use the temporary kubeconfig file to check on the status of resources in the bootstrap cluster.
kubectl --kubeconfig=.kube-tkg/tmp/config_oDL6huqf get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-696c55fc88-g4bnv 0/2 Pending 0 5s
capi-system capi-controller-manager-64f89c966c-4l8g4 0/2 ContainerCreating 0 32s
capi-webhook-system capi-controller-manager-c776dccfb-2srvm 0/2 ContainerCreating 0 43s
capi-webhook-system capi-kubeadm-bootstrap-controller-manager-6dd4c8f4f9-c4fqg 0/2 Pending 0 19s
capi-webhook-system capi-kubeadm-control-plane-controller-manager-d978674c-4bt5s 0/2 Pending 0 2s
cert-manager cert-manager-5cf6d4bbd8-l7drj 1/1 Running 0 5m32s
cert-manager cert-manager-cainjector-56c57c56f-ld4vg 1/1 Running 0 5m32s
cert-manager cert-manager-webhook-59c765ccdf-nlppd 1/1 Running 0 5m31s
kube-system coredns-5cf78cdcc-6jxgh 1/1 Running 0 7m56s
kube-system coredns-5cf78cdcc-tcmf4 1/1 Running 0 7m55s
kube-system etcd-tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane 1/1 Running 0 8m4s
kube-system kindnet-r7rsz 1/1 Running 0 7m56s
kube-system kube-apiserver-tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane 1/1 Running 0 8m4s
kube-system kube-controller-manager-tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane 1/1 Running 2 8m4s
kube-system kube-proxy-bl4m6 1/1 Running 0 7m56s
kube-system kube-scheduler-tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane 0/1 Running 2 8m4s
local-path-storage local-path-provisioner-bd4bb6b75-cmdfd 1/1 Running 1 7m53s
You should see output similar to the following from the tkg init
command after a few minutes:
Logs of the command execution can also be found at: /tmp/tkg-20200806T183810851437735.log
Validating the pre-requisites...
Setting up management cluster...
Validating configuration...
Using infrastructure provider aws:v0.5.4
Generating cluster configuration...
Setting up bootstrapper...
Bootstrapper created. Kubeconfig: /home/ubuntu/.kube-tkg/tmp/config_oDL6huqf
Installing providers on bootstrapper...
Fetching providers
Installing cert-manager
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.6" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-aws" Version="v0.5.4" TargetNamespace="capa-system"
Start creating management cluster...
At this point, you can review EC2 objects and VPCs created in AWS. Ultimately, you’ll see new Running instances, Elastic IPs, Volumes, Security Groups and Load Balancers, as well as a new VPC.

You can drill down into each to see more details about what has been created.
Instances:
Volumes:
Elastic IPs:
Load Balancers:
Security Groups:
VPCs:
When the installation of the management cluster is completed, you should see output similar to the following on the command line:
Saving management cluster kuebconfig into /home/ubuntu/.kube/config
Installing providers on management cluster...
Fetching providers
Installing cert-manager
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.6" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-aws" Version="v0.5.4" TargetNamespace="capa-system"
Waiting for the management cluster to get ready for move...
Moving all Cluster API objects from bootstrap cluster to management cluster...
Performing move...
Discovering Cluster API objects
Moving Cluster API objects Clusters=1
Creating objects in the target cluster
Deleting objects from the source cluster
Context set for management cluster cjlittle-mgmt-aws as 'cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws'.
Management cluster created!
You can now create your first workload cluster by running the following:
tkg create cluster [name] --kubernetes-version=[version] --plan=[plan]
If you run the kubectl config get-contexts
command, you’ll see that you have a new context present.
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws cjlittle-mgmt-aws cjlittle-mgmt-aws-admin
You can switch to this context and then examine the objects created.
kubectl config use-context cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws
Switched to context "cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws".
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-0-138.ec2.internal Ready <none> 24m v1.18.6+vmware.1
ip-10-0-0-154.ec2.internal Ready master 25m v1.18.6+vmware.1
kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
capa-system capa-controller-manager-5c4ff75f77-w2w8v 2/2 Running 0 24m
capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-696c55fc88-bx9cd 2/2 Running 0 25m
capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-779fc7b675-7jknm 2/2 Running 0 25m
capi-system capi-controller-manager-64f89c966c-lnzcl 2/2 Running 0 25m
capi-webhook-system capa-controller-manager-b6f7487c-gtpkm 2/2 Running 1 24m
capi-webhook-system capi-controller-manager-c776dccfb-mmhbm 2/2 Running 0 25m
capi-webhook-system capi-kubeadm-bootstrap-controller-manager-6dd4c8f4f9-crv9x 2/2 Running 0 25m
capi-webhook-system capi-kubeadm-control-plane-controller-manager-d978674c-qcstg 2/2 Running 0 25m
cert-manager cert-manager-5cf6d4bbd8-chqqq 1/1 Running 0 27m
cert-manager cert-manager-cainjector-56c57c56f-zlbvn 1/1 Running 0 27m
cert-manager cert-manager-webhook-59c765ccdf-76lss 1/1 Running 0 27m
kube-system calico-kube-controllers-7d598d6b58-m4b8k 1/1 Running 0 27m
kube-system calico-node-q2bzw 1/1 Running 0 26m
kube-system calico-node-t9xnq 1/1 Running 0 27m
kube-system coredns-5cf78cdcc-5pdcq 1/1 Running 0 27m
kube-system coredns-5cf78cdcc-pdzws 1/1 Running 0 27m
kube-system etcd-ip-10-0-0-154.ec2.internal 1/1 Running 0 28m
kube-system kube-apiserver-ip-10-0-0-154.ec2.internal 1/1 Running 0 28m
kube-system kube-controller-manager-ip-10-0-0-154.ec2.internal 1/1 Running 0 28m
kube-system kube-proxy-gd7c2 1/1 Running 0 27m
kube-system kube-proxy-lkrc6 1/1 Running 0 26m
kube-system kube-scheduler-ip-10-0-0-154.ec2.internal 1/1 Running 0 28m
You may find it useful to review the logs from some of these pods if you run into issues with any subsequent operations in your new TKG environment.
The tkg get mc
command should now display some information about the management cluster.
MANAGEMENT-CLUSTER-NAME CONTEXT-NAME
cjlittle-mgmt-aws * cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws
Provision your workload cluster
Now that the management cluster is created, it’s relatively simple to provision workload clusters. We’ll create a small cluster with the same configuration as the management cluster. You can pass different parameters to the tkg create cluster
command or edit the .tkg/config.yaml
file if you want to create a workload cluster with a different configuration from the management cluster.
tkg create cluster cjlittle-test-aws --plan dev
Again, you will see more EC2 resources being configured in AWS as well as a new VPC being created.

When the installation of the workload cluster is completed, you should see output similar to the following on the command line:
Logs of the command execution can also be found at: /tmp/tkg-20200806T230411350608441.log
Validating configuration...
Creating workload cluster 'cjlittle-test-aws'...
Waiting for cluster to be initialized...
Waiting for cluster nodes to be available...
Workload cluster 'cjlittle-test-aws' created
You’ll need to run the tkg get credentials
command to have a new context created for the workload cluster.
tkg get credentials cjlittle-test-aws
Credentials of workload cluster 'cjlittle-test-aws' have been saved
You can now access the cluster by running 'kubectl config use-context cjlittle-test-aws-admin@cjlittle-test-aws'
Per the output of the previous command, you can now switch contexts to get access to the workload cluster.
kubectl config use-context cjlittle-test-aws-admin@cjlittle-test-aws
Switched to context "cjlittle-test-aws-admin@cjlittle-test-aws".
And you should be able to investigate the objects that were created.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-0-78.ec2.internal Ready <none> 6m31s v1.18.6+vmware.1
ip-10-0-0-91.ec2.internal Ready master 8m37s v1.18.6+vmware.1
kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-7d598d6b58-5mfdn 1/1 Running 0 8m33s
kube-system calico-node-42c8t 1/1 Running 0 7m40s
kube-system calico-node-5g26g 1/1 Running 0 8m34s
kube-system coredns-5cf78cdcc-7r47b 1/1 Running 0 9m31s
kube-system coredns-5cf78cdcc-w6zbp 1/1 Running 0 9m31s
kube-system etcd-ip-10-0-0-91.ec2.internal 1/1 Running 0 9m44s
kube-system kube-apiserver-ip-10-0-0-91.ec2.internal 1/1 Running 0 9m44s
kube-system kube-controller-manager-ip-10-0-0-91.ec2.internal 1/1 Running 0 9m44s
kube-system kube-proxy-mn47t 1/1 Running 0 7m40s
kube-system kube-proxy-znrwg 1/1 Running 0 9m31s
kube-system kube-scheduler-ip-10-0-0-91.ec2.internal 1/1 Running 0 9m44s
The tkg get cluster
command should return information about your workload cluster now.
NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES
cjlittle-test-aws default running 1/1 1/1 v1.18.6+vmware.1
Access your TKG nodes
If you need to access the TKG nodes directly, you’ll find that you have to go through the bastion host that was created to get to them.
To access the bastion host, navigate to Running Instances in EC2 and then select the appropriate bastion host. You can identify it by it’s name starting with the cluster name and ending in -bastion. With the bastion host selected, click on the Actions dropdown and then select Connect.

The command in the Example section is about all you need to get to the bastion host as long as you have a copy of the ssh key pair you created saved locally.
ssh -i "cjlittle-tkg.pem" ubuntu@ec2-107-23-251-116.compute-1.amazonaws.com
Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-1047-aws x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
Get cloud support with Ubuntu Advantage Cloud Guest:
http://www.ubuntu.com/business/services/cloud
224 packages can be updated.
151 updates are security updates.
New release '18.04.4 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
*** System restart required ***
Last login: Fri Aug 7 05:30:43 2020 from 24.8.90.129
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
ubuntu@ip-10-0-1-9:~$
From here you can ssh to the control plane and worker nodes on their internal IP addresses (with the same ssh key pair) as the ec2-user user. The ssh key pair does not exist on the bastion host so you’ll have to create it manually. You can get the internal IP address of the nodes via kubectl get nodes -o wide
or via their Description page under Running Instances in EC2.
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-0-0-138.ec2.internal Ready <none> 51m v1.18.6+vmware.1 10.0.0.138 <none> Amazon Linux 2 4.14.186-146.268.amzn2.x86_64 containerd://1.3.4
ip-10-0-0-154.ec2.internal Ready master 53m v1.18.6+vmware.1 10.0.0.154 <none> Amazon Linux 2 4.14.186-146.268.amzn2.x86_64 containerd://1.3.4
ssh -i "cjlittle-tkg.pem" ec2-user@10.0.0.154
The authenticity of host '10.0.0.154 (10.0.0.154)' can't be established.
ECDSA key fingerprint is SHA256:yTm3+EitD6/oANtdxoqL2EwHgedRCsP1bEmL4UnpZO8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.0.154' (ECDSA) to the list of known hosts.
__| __|_ )
_| ( / Amazon Linux 2 AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-2/
[ec2-user@ip-10-0-0-154 ~]$