How to deploy a TKG cluster on AWS using the tkg CLI

I had been so focused on TKG on vSphere lately that I had been overlooking the fact that we’ve automated the process for standing up a cluster on AWS as well. While it’s far easier to do this via the TKG UI, I wanted to take a stab at it entirely from the command line. The process wasn’t at all bad but did require a fair amount of prep work up front to get things ready on AWS. I will say that after I went through this process once I’ve been able to spin up and tear down numerous clusters on AWS in a matter of minutes.

Note: These instructions assume you already have some level of access to AWS and to the tkg CLI.

Create an IAM User on AWS

You will need to have an IAM account/password on AWS that you use for creating resources needed by TKG. If you already have an IAM account with administrative access that you’d like to use you can skip this first part.

When you are logged in to AWS, you will find the IAM section under Security, Identity & Compliance:

Once you’re on the IAM page, you can click on the Users link and then on the Add user button to create a new IAM user. Enter a meaningful name and set the Access type to Programmatic access.

Click the Next: Permissions button.

You’ll need to assign permissions to your new user via group membership or by attaching policies directly. You might already have a group set up which you could use for this step. I’m choosing to grant my new user the AdministratorAccess policy.

Click Next: Tags and then click Next: Review. Validate that the account will be created the way you want and then click Create user.

On the last page, it’s very important that you click the Show link under Secret access key as it is the only time you will be able to retrieve this value. Save it somewhere secure.

Click the Close button.

To allow this user to login to the AWS console you will need to configure its credentials. Select your new user and then click on the Security credentials tab. Click the Manage link next to Console password.

Set Console access to Enable. You can either use an auto-generated password or specify your own. Click the Apply button.

Navigate to the main AWS page and then to EC2 > Network & Security > Key Pairs. Click the Create key pair button at the top right. Note: You can reuse an existing key pair if you have one and can skip this step.

Specify a name for the new key pair. The choice of file format is dependent on how you might want to access the bastion host.

Click the Create key pair button.

Prepare for cluster creation

You’ll need to download the tkg CLI as well as the aws CLI, clusterawsadm and jq utilities. You can find the latest tkg CLI and cclusterawsadm downloads at Download VMware Tanzu Kubernetes Grid. There are numerous installation methods for installing jq based on where you’re going to run your CLI commands. The instructions for installing the aws CLI can be found at Installing the AWS CLI version 2.

Once your command line utilities are downloaded and in place, you can start with setting up some environment variables

export AWS_ACCESS_KEY_ID=<aws_access_key created earlier>
export AWS_SECRET_ACCESS_KEY=<aws_access_key_secret created earlier>
export AWS_SESSION_TOKEN=aws_session_token (only needed if you use multifactor authentication)
export AWS_REGION=us-east-1 (set this as appropriate)

Once these are set you can use the clusterawsadm command to create a CloudFoundation stack on AWS.

clusterawsadm alpha bootstrap create-stack
Attempting to create CloudFormation stack cluster-api-provider-aws-sigs-k8s-io

While this is creating you should see a new CloudFormation stack being created on AWS.

When the stack is created, you’ll see output similar to the following on the command line:

Following resources are in the stack:

Resource                  |Type                                                                                |Status
AWS::IAM::Group           |bootstrapper.cluster-api-provider-aws.sigs.k8s.io                                   |CREATE_COMPLETE
AWS::IAM::InstanceProfile |control-plane.cluster-api-provider-aws.sigs.k8s.io                                  |CREATE_COMPLETE
AWS::IAM::InstanceProfile |controllers.cluster-api-provider-aws.sigs.k8s.io                                    |CREATE_COMPLETE
AWS::IAM::InstanceProfile |nodes.cluster-api-provider-aws.sigs.k8s.io                                          |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::537043370288:policy/control-plane.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::537043370288:policy/nodes.cluster-api-provider-aws.sigs.k8s.io         |CREATE_COMPLETE
AWS::IAM::ManagedPolicy   |arn:aws:iam::537043370288:policy/controllers.cluster-api-provider-aws.sigs.k8s.io   |CREATE_COMPLETE
AWS::IAM::Role            |control-plane.cluster-api-provider-aws.sigs.k8s.io                                  |CREATE_COMPLETE
AWS::IAM::Role            |controllers.cluster-api-provider-aws.sigs.k8s.io                                    |CREATE_COMPLETE
AWS::IAM::Role            |nodes.cluster-api-provider-aws.sigs.k8s.io                                          |CREATE_COMPLETE
AWS::IAM::User            |bootstrapper.cluster-api-provider-aws.sigs.k8s.io                                   |CREATE_COMPLETE

And you’ll see the Status of your new stack as CREATE_COMPLETE at AWS.

Next up is setting more environment variables related to AWS which will be used by the tkg CLI as it utilized Cluster API to create the cluster.

export AWS_CREDENTIALS=$(aws iam create-access-key --user-name bootstrapper.cluster-api-provider-aws.sigs.k8s.io --output json)
export AWS_ACCESS_KEY_ID=$(echo $AWS_CREDENTIALS | jq .AccessKey.AccessKeyId -r)
export AWS_SECRET_ACCESS_KEY=$(echo $AWS_CREDENTIALS | jq .AccessKey.SecretAccessKey -r)
export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm alpha bootstrap encode-aws-credentials)

Provision the management cluster

If you’ve never used TKG you’ll first need to run the tkg get mc command to create a .tkg folder and .tkg/config.yaml file.

Open the .tkg/config.yaml file in a text editor. Unless you’ve created other TKG clusters, it should look like the following:

cert-manager-timeout: 30m0s
overridesFolder: /home/ubuntu/.tkg/overrides
NODE_STARTUP_TIMEOUT: 20m
BASTION_HOST_ENABLED: "true"
providers:
  - name: cluster-api
    url: /home/ubuntu/.tkg/providers/cluster-api/v0.3.6/core-components.yaml
    type: CoreProvider
  - name: aws
    url: /home/ubuntu/.tkg/providers/infrastructure-aws/v0.5.4/infrastructure-components.yaml
    type: InfrastructureProvider
  - name: vsphere
    url: /home/ubuntu/.tkg/providers/infrastructure-vsphere/v0.6.6/infrastructure-components.yaml
    type: InfrastructureProvider
  - name: tkg-service-vsphere
    url: /home/ubuntu/.tkg/providers/infrastructure-tkg-service-vsphere/v1.0.0/unused.yaml
    type: InfrastructureProvider
  - name: kubeadm
    url: /home/ubuntu/.tkg/providers/bootstrap-kubeadm/v0.3.6/bootstrap-components.yaml
    type: BootstrapProvider
  - name: kubeadm
    url: /home/ubuntu/.tkg/providers/control-plane-kubeadm/v0.3.6/control-plane-components.yaml
    type: ControlPlaneProvider
images:
    all:
        repository: gcr.io/kubernetes-development-244305/cluster-api
    cert-manager:
        repository: gcr.io/kubernetes-development-244305/cert-manager
        tag: v0.11.0_vmware.1
release:
    version: v1.1.3

You’ll need to add a series of variables to this file to allow a TKG cluster to be created on AWS.

AWS_REGION:
AWS_NODE_AZ:
AWS_PRIVATE_NODE_CIDR:
AWS_PUBLIC_NODE_CIDR:
AWS_PUBLIC_SUBNET_ID:
AWS_PRIVATE_SUBNET_ID:
AWS_SSH_KEY_NAME:   
AWS_VPC_ID:
AWS_VPC_CIDR:    
BASTION_HOST_ENABLED:  
CLUSTER_CIDR:    
CONTROL_PLANE_MACHINE_TYPE:
NODE_MACHINE_TYPE:

You can find detailed information about each parameter at Deploy Management Clusters to Amazon EC2 with the CLI. The following is what mine looked like:

AWS_REGION: us-east-1
AWS_NODE_AZ: us-east-1a
AWS_PRIVATE_NODE_CIDR: 10.0.0.0/24
AWS_PUBLIC_NODE_CIDR: 10.0.1.0/24
AWS_PUBLIC_SUBNET_ID:
AWS_PRIVATE_SUBNET_ID:
AWS_SSH_KEY_NAME: cjlittle-tkg
AWS_VPC_ID:
AWS_VPC_CIDR: 10.0.0.0/16
#BASTION_HOST_ENABLED:
CLUSTER_CIDR: 100.96.0.0/11
CONTROL_PLANE_MACHINE_TYPE: m5.large
NODE_MACHINE_TYPE: m5.large

You’ll notice that several items are blank as they were optional or not relevant to my install (AWS_VPC_ID is only set if you’re re-using an existing VPC, see Requirements for Using an Existing VPC to Provision a Cluster for details). Also, you should review Amazon EC2 Instance Types for details on the different machine types. I also commented out the BASTION_HOST_ENABLED line as it already existed and was set to true.

Now we’re ready to kick things off with the tkg init command, which will create the management cluster. There are loads of options you can specify to fine-tune your management cluster. In this example, we’ll specify a name for the cluster being created and specify the dev plan which will create a single control-plane, single worker node deployment.

tkg init --infrastructure aws --name cjlittle-mgmt-aws --plan dev

You should be able to see a kind (Kubernetes in Docker) container running in Docker. This is the bootstrap Kubernetes cluster and will start the process off of using Cluster API to provision the TKG cluster on AWS.

docker ps
CONTAINER ID        IMAGE                                                             COMMAND                  CREATED             STATUS              PORTS                       NAMES
28648ceb690a        gcr.io/kubernetes-development-244305/kind/node:v1.18.6_vmware.1   "/usr/local/bin/entr…"   4 minutes ago       Up About a minute   127.0.0.1:39773->6443/tcp   tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane

If you need to troubleshoot the early bootstrap stages, you can docker exec into this container to review logs. You can also use the temporary kubeconfig file to check on the status of resources in the bootstrap cluster.

kubectl --kubeconfig=.kube-tkg/tmp/config_oDL6huqf get po -A
NAMESPACE                       NAME                                                                  READY   STATUS              RESTARTS   AGE
capi-kubeadm-bootstrap-system   capi-kubeadm-bootstrap-controller-manager-696c55fc88-g4bnv            0/2     Pending             0          5s
capi-system                     capi-controller-manager-64f89c966c-4l8g4                              0/2     ContainerCreating   0          32s
capi-webhook-system             capi-controller-manager-c776dccfb-2srvm                               0/2     ContainerCreating   0          43s
capi-webhook-system             capi-kubeadm-bootstrap-controller-manager-6dd4c8f4f9-c4fqg            0/2     Pending             0          19s
capi-webhook-system             capi-kubeadm-control-plane-controller-manager-d978674c-4bt5s          0/2     Pending             0          2s
cert-manager                    cert-manager-5cf6d4bbd8-l7drj                                         1/1     Running             0          5m32s
cert-manager                    cert-manager-cainjector-56c57c56f-ld4vg                               1/1     Running             0          5m32s
cert-manager                    cert-manager-webhook-59c765ccdf-nlppd                                 1/1     Running             0          5m31s
kube-system                     coredns-5cf78cdcc-6jxgh                                               1/1     Running             0          7m56s
kube-system                     coredns-5cf78cdcc-tcmf4                                               1/1     Running             0          7m55s
kube-system                     etcd-tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane                      1/1     Running             0          8m4s
kube-system                     kindnet-r7rsz                                                         1/1     Running             0          7m56s
kube-system                     kube-apiserver-tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane            1/1     Running             0          8m4s
kube-system                     kube-controller-manager-tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane   1/1     Running             2          8m4s
kube-system                     kube-proxy-bl4m6                                                      1/1     Running             0          7m56s
kube-system                     kube-scheduler-tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane            0/1     Running             2          8m4s
local-path-storage              local-path-provisioner-bd4bb6b75-cmdfd                                1/1     Running             1          7m53s

You should see output similar to the following from the tkg init command after a few minutes:

Logs of the command execution can also be found at: /tmp/tkg-20200806T183810851437735.log

Validating the pre-requisites...

Setting up management cluster...
Validating configuration...
Using infrastructure provider aws:v0.5.4
Generating cluster configuration...
Setting up bootstrapper...
Bootstrapper created. Kubeconfig: /home/ubuntu/.kube-tkg/tmp/config_oDL6huqf
Installing providers on bootstrapper...
Fetching providers
Installing cert-manager
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.6" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-aws" Version="v0.5.4" TargetNamespace="capa-system"
Start creating management cluster...

At this point, you can review EC2 objects and VPCs created in AWS. Ultimately, you’ll see new Running instances, Elastic IPs, Volumes, Security Groups and Load Balancers, as well as a new VPC.

You can drill down into each to see more details about what has been created.

Instances:

Volumes:

Elastic IPs:

Load Balancers:

Security Groups:

VPCs:

When the installation of the management cluster is completed, you should see output similar to the following on the command line:

Saving management cluster kuebconfig into /home/ubuntu/.kube/config
Installing providers on management cluster...
Fetching providers
Installing cert-manager
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.6" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-aws" Version="v0.5.4" TargetNamespace="capa-system"
Waiting for the management cluster to get ready for move...
Moving all Cluster API objects from bootstrap cluster to management cluster...
Performing move...
Discovering Cluster API objects
Moving Cluster API objects Clusters=1
Creating objects in the target cluster
Deleting objects from the source cluster
Context set for management cluster cjlittle-mgmt-aws as 'cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws'.

Management cluster created!


You can now create your first workload cluster by running the following:

  tkg create cluster [name] --kubernetes-version=[version] --plan=[plan]

If you run the kubectl config get-contexts command, you’ll see that you have a new context present.

CURRENT   NAME                                        CLUSTER             AUTHINFO                  NAMESPACE
          cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws   cjlittle-mgmt-aws   cjlittle-mgmt-aws-admin

You can switch to this context and then examine the objects created.

kubectl config use-context cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws
Switched to context "cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws".
kubectl get nodes
NAME                         STATUS   ROLES    AGE   VERSION
ip-10-0-0-138.ec2.internal   Ready    <none>   24m   v1.18.6+vmware.1
ip-10-0-0-154.ec2.internal   Ready    master   25m   v1.18.6+vmware.1
kubectl get po -A
NAMESPACE                           NAME                                                             READY   STATUS    RESTARTS   AGE
capa-system                         capa-controller-manager-5c4ff75f77-w2w8v                         2/2     Running   0          24m
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager-696c55fc88-bx9cd       2/2     Running   0          25m
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager-779fc7b675-7jknm   2/2     Running   0          25m
capi-system                         capi-controller-manager-64f89c966c-lnzcl                         2/2     Running   0          25m
capi-webhook-system                 capa-controller-manager-b6f7487c-gtpkm                           2/2     Running   1          24m
capi-webhook-system                 capi-controller-manager-c776dccfb-mmhbm                          2/2     Running   0          25m
capi-webhook-system                 capi-kubeadm-bootstrap-controller-manager-6dd4c8f4f9-crv9x       2/2     Running   0          25m
capi-webhook-system                 capi-kubeadm-control-plane-controller-manager-d978674c-qcstg     2/2     Running   0          25m
cert-manager                        cert-manager-5cf6d4bbd8-chqqq                                    1/1     Running   0          27m
cert-manager                        cert-manager-cainjector-56c57c56f-zlbvn                          1/1     Running   0          27m
cert-manager                        cert-manager-webhook-59c765ccdf-76lss                            1/1     Running   0          27m
kube-system                         calico-kube-controllers-7d598d6b58-m4b8k                         1/1     Running   0          27m
kube-system                         calico-node-q2bzw                                                1/1     Running   0          26m
kube-system                         calico-node-t9xnq                                                1/1     Running   0          27m
kube-system                         coredns-5cf78cdcc-5pdcq                                          1/1     Running   0          27m
kube-system                         coredns-5cf78cdcc-pdzws                                          1/1     Running   0          27m
kube-system                         etcd-ip-10-0-0-154.ec2.internal                                  1/1     Running   0          28m
kube-system                         kube-apiserver-ip-10-0-0-154.ec2.internal                        1/1     Running   0          28m
kube-system                         kube-controller-manager-ip-10-0-0-154.ec2.internal               1/1     Running   0          28m
kube-system                         kube-proxy-gd7c2                                                 1/1     Running   0          27m
kube-system                         kube-proxy-lkrc6                                                 1/1     Running   0          26m
kube-system                         kube-scheduler-ip-10-0-0-154.ec2.internal                        1/1     Running   0          28m

You may find it useful to review the logs from some of these pods if you run into issues with any subsequent operations in your new TKG environment.

The tkg get mc command should now display some information about the management cluster.


 MANAGEMENT-CLUSTER-NAME  CONTEXT-NAME
 cjlittle-mgmt-aws *      cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws

Provision your workload cluster

Now that the management cluster is created, it’s relatively simple to provision workload clusters. We’ll create a small cluster with the same configuration as the management cluster. You can pass different parameters to the tkg create cluster command or edit the .tkg/config.yaml file if you want to create a workload cluster with a different configuration from the management cluster.

tkg create cluster cjlittle-test-aws --plan dev

Again, you will see more EC2 resources being configured in AWS as well as a new VPC being created.

When the installation of the workload cluster is completed, you should see output similar to the following on the command line:

Logs of the command execution can also be found at: /tmp/tkg-20200806T230411350608441.log
Validating configuration...
Creating workload cluster 'cjlittle-test-aws'...
Waiting for cluster to be initialized...
Waiting for cluster nodes to be available...

Workload cluster 'cjlittle-test-aws' created

You’ll need to run the tkg get credentials command to have a new context created for the workload cluster.

tkg get credentials cjlittle-test-aws
Credentials of workload cluster 'cjlittle-test-aws' have been saved
You can now access the cluster by running 'kubectl config use-context cjlittle-test-aws-admin@cjlittle-test-aws'

Per the output of the previous command, you can now switch contexts to get access to the workload cluster.

kubectl config use-context cjlittle-test-aws-admin@cjlittle-test-aws
Switched to context "cjlittle-test-aws-admin@cjlittle-test-aws".

And you should be able to investigate the objects that were created.

kubectl get nodes
NAME                        STATUS   ROLES    AGE     VERSION
ip-10-0-0-78.ec2.internal   Ready    <none>   6m31s   v1.18.6+vmware.1
ip-10-0-0-91.ec2.internal   Ready    master   8m37s   v1.18.6+vmware.1
kubectl get po -A
NAMESPACE     NAME                                                READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-7d598d6b58-5mfdn            1/1     Running   0          8m33s
kube-system   calico-node-42c8t                                   1/1     Running   0          7m40s
kube-system   calico-node-5g26g                                   1/1     Running   0          8m34s
kube-system   coredns-5cf78cdcc-7r47b                             1/1     Running   0          9m31s
kube-system   coredns-5cf78cdcc-w6zbp                             1/1     Running   0          9m31s
kube-system   etcd-ip-10-0-0-91.ec2.internal                      1/1     Running   0          9m44s
kube-system   kube-apiserver-ip-10-0-0-91.ec2.internal            1/1     Running   0          9m44s
kube-system   kube-controller-manager-ip-10-0-0-91.ec2.internal   1/1     Running   0          9m44s
kube-system   kube-proxy-mn47t                                    1/1     Running   0          7m40s
kube-system   kube-proxy-znrwg                                    1/1     Running   0          9m31s
kube-system   kube-scheduler-ip-10-0-0-91.ec2.internal            1/1     Running   0          9m44s

The tkg get cluster command should return information about your workload cluster now.

 NAME               NAMESPACE  STATUS   CONTROLPLANE  WORKERS  KUBERNETES
 cjlittle-test-aws  default    running  1/1           1/1      v1.18.6+vmware.1

Access your TKG nodes

If you need to access the TKG nodes directly, you’ll find that you have to go through the bastion host that was created to get to them. 

To access the bastion host, navigate to Running Instances in EC2 and then select the appropriate bastion host. You can identify it by it’s name starting with the cluster name and ending in -bastion. With the bastion host selected, click on the Actions dropdown and then select Connect.

This image has an empty alt attribute; its file name is connect.jpg

The command in the Example section is about all you need to get to the bastion host as long as you have a copy of the ssh key pair you created saved locally.

ssh -i "cjlittle-tkg.pem" ubuntu@ec2-107-23-251-116.compute-1.amazonaws.com
Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-1047-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

224 packages can be updated.
151 updates are security updates.

New release '18.04.4 LTS' available.
Run 'do-release-upgrade' to upgrade to it.


*** System restart required ***
Last login: Fri Aug  7 05:30:43 2020 from 24.8.90.129
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@ip-10-0-1-9:~$

From here you can ssh to the control plane and worker nodes on their internal IP addresses (with the same ssh key pair) as the ec2-user user.  The ssh key pair does not exist on the bastion host so you’ll have to create it manually. You can get the internal IP address of the nodes via kubectl get nodes -o wide or via their Description page under Running Instances in EC2.

kubectl get nodes -o wide
NAME                         STATUS   ROLES    AGE   VERSION            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-10-0-0-138.ec2.internal   Ready    <none>   51m   v1.18.6+vmware.1   10.0.0.138    <none>        Amazon Linux 2   4.14.186-146.268.amzn2.x86_64   containerd://1.3.4
ip-10-0-0-154.ec2.internal   Ready    master   53m   v1.18.6+vmware.1   10.0.0.154    <none>        Amazon Linux 2   4.14.186-146.268.amzn2.x86_64   containerd://1.3.4
This image has an empty alt attribute; its file name is description-1-1024x924.jpg
ssh -i "cjlittle-tkg.pem" ec2-user@10.0.0.154
The authenticity of host '10.0.0.154 (10.0.0.154)' can't be established.
ECDSA key fingerprint is SHA256:yTm3+EitD6/oANtdxoqL2EwHgedRCsP1bEmL4UnpZO8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.0.154' (ECDSA) to the list of known hosts.

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
[ec2-user@ip-10-0-0-154 ~]$

Leave a Comment

Your email address will not be published.