I had been so focused on TKG on vSphere lately that I had been overlooking the fact that we’ve automated the process for standing up a cluster on AWS as well. While it’s far easier to do this via the TKG UI, I wanted to take a stab at it entirely from the command line. The process wasn’t at all bad but did require a fair amount of prep work up front to get things ready on AWS. I will say that after I went through this process once I’ve been able to spin up and tear down numerous clusters on AWS in a matter of minutes.
Note: These instructions assume you already have some level of access to AWS and to the
Create an IAM User on AWS
You will need to have an IAM account/password on AWS that you use for creating resources needed by TKG. If you already have an IAM account with administrative access that you’d like to use you can skip this first part.
When you are logged in to AWS, you will find the IAM section under Security, Identity & Compliance:
Once you’re on the IAM page, you can click on the Users link and then on the Add user button to create a new IAM user. Enter a meaningful name and set the Access type to Programmatic access.
Click the Next: Permissions button.
You’ll need to assign permissions to your new user via group membership or by attaching policies directly. You might already have a group set up which you could use for this step. I’m choosing to grant my new user the AdministratorAccess policy.
Click Next: Tags and then click Next: Review. Validate that the account will be created the way you want and then click Create user.
On the last page, it’s very important that you click the Show link under Secret access key as it is the only time you will be able to retrieve this value. Save it somewhere secure.
Click the Close button.
To allow this user to login to the AWS console you will need to configure its credentials. Select your new user and then click on the Security credentials tab. Click the Manage link next to Console password.
Set Console access to Enable. You can either use an auto-generated password or specify your own. Click the Apply button.
Navigate to the main AWS page and then to EC2 > Network & Security > Key Pairs. Click the Create key pair button at the top right. Note: You can reuse an existing key pair if you have one and can skip this step.
Specify a name for the new key pair. The choice of file format is dependent on how you might want to access the bastion host.
Click the Create key pair button.
Prepare for cluster creation
You’ll need to download the
tkg CLI as well as the
jq utilities. You can find the latest
tkg CLI and
cclusterawsadm downloads at Download VMware Tanzu Kubernetes Grid. There are numerous installation methods for installing
jq based on where you’re going to run your CLI commands. The instructions for installing the
aws CLI can be found at Installing the AWS CLI version 2.
Once your command line utilities are downloaded and in place, you can start with setting up some environment variables
export AWS_ACCESS_KEY_ID=<aws_access_key created earlier> export AWS_SECRET_ACCESS_KEY=<aws_access_key_secret created earlier> export AWS_SESSION_TOKEN=aws_session_token (only needed if you use multifactor authentication) export AWS_REGION=us-east-1 (set this as appropriate)
Once these are set you can use the
clusterawsadm command to create a CloudFoundation stack on AWS.
clusterawsadm alpha bootstrap create-stack Attempting to create CloudFormation stack cluster-api-provider-aws-sigs-k8s-io
While this is creating you should see a new CloudFormation stack being created on AWS.
When the stack is created, you’ll see output similar to the following on the command line:
Following resources are in the stack: Resource |Type |Status AWS::IAM::Group |bootstrapper.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE AWS::IAM::InstanceProfile |control-plane.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE AWS::IAM::InstanceProfile |controllers.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE AWS::IAM::InstanceProfile |nodes.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE AWS::IAM::ManagedPolicy |arn:aws:iam::537043370288:policy/control-plane.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE AWS::IAM::ManagedPolicy |arn:aws:iam::537043370288:policy/nodes.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE AWS::IAM::ManagedPolicy |arn:aws:iam::537043370288:policy/controllers.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE AWS::IAM::Role |control-plane.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE AWS::IAM::Role |controllers.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE AWS::IAM::Role |nodes.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE AWS::IAM::User |bootstrapper.cluster-api-provider-aws.sigs.k8s.io |CREATE_COMPLETE
And you’ll see the Status of your new stack as CREATE_COMPLETE at AWS.
Next up is setting more environment variables related to AWS which will be used by the
tkg CLI as it utilized Cluster API to create the cluster.
export AWS_CREDENTIALS=$(aws iam create-access-key --user-name bootstrapper.cluster-api-provider-aws.sigs.k8s.io --output json) export AWS_ACCESS_KEY_ID=$(echo $AWS_CREDENTIALS | jq .AccessKey.AccessKeyId -r) export AWS_SECRET_ACCESS_KEY=$(echo $AWS_CREDENTIALS | jq .AccessKey.SecretAccessKey -r) export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm alpha bootstrap encode-aws-credentials)
Provision the management cluster
If you’ve never used TKG you’ll first need to run the
tkg get mc command to create a
.tkg folder and
.tkg/config.yaml file in a text editor. Unless you’ve created other TKG clusters, it should look like the following:
cert-manager-timeout: 30m0s overridesFolder: /home/ubuntu/.tkg/overrides NODE_STARTUP_TIMEOUT: 20m BASTION_HOST_ENABLED: "true" providers: - name: cluster-api url: /home/ubuntu/.tkg/providers/cluster-api/v0.3.6/core-components.yaml type: CoreProvider - name: aws url: /home/ubuntu/.tkg/providers/infrastructure-aws/v0.5.4/infrastructure-components.yaml type: InfrastructureProvider - name: vsphere url: /home/ubuntu/.tkg/providers/infrastructure-vsphere/v0.6.6/infrastructure-components.yaml type: InfrastructureProvider - name: tkg-service-vsphere url: /home/ubuntu/.tkg/providers/infrastructure-tkg-service-vsphere/v1.0.0/unused.yaml type: InfrastructureProvider - name: kubeadm url: /home/ubuntu/.tkg/providers/bootstrap-kubeadm/v0.3.6/bootstrap-components.yaml type: BootstrapProvider - name: kubeadm url: /home/ubuntu/.tkg/providers/control-plane-kubeadm/v0.3.6/control-plane-components.yaml type: ControlPlaneProvider images: all: repository: gcr.io/kubernetes-development-244305/cluster-api cert-manager: repository: gcr.io/kubernetes-development-244305/cert-manager tag: v0.11.0_vmware.1 release: version: v1.1.3
You’ll need to add a series of variables to this file to allow a TKG cluster to be created on AWS.
AWS_REGION: AWS_NODE_AZ: AWS_PRIVATE_NODE_CIDR: AWS_PUBLIC_NODE_CIDR: AWS_PUBLIC_SUBNET_ID: AWS_PRIVATE_SUBNET_ID: AWS_SSH_KEY_NAME: AWS_VPC_ID: AWS_VPC_CIDR: BASTION_HOST_ENABLED: CLUSTER_CIDR: CONTROL_PLANE_MACHINE_TYPE: NODE_MACHINE_TYPE:
You can find detailed information about each parameter at Deploy Management Clusters to Amazon EC2 with the CLI. The following is what mine looked like:
AWS_REGION: us-east-1 AWS_NODE_AZ: us-east-1a AWS_PRIVATE_NODE_CIDR: 10.0.0.0/24 AWS_PUBLIC_NODE_CIDR: 10.0.1.0/24 AWS_PUBLIC_SUBNET_ID: AWS_PRIVATE_SUBNET_ID: AWS_SSH_KEY_NAME: cjlittle-tkg AWS_VPC_ID: AWS_VPC_CIDR: 10.0.0.0/16 #BASTION_HOST_ENABLED: CLUSTER_CIDR: 100.96.0.0/11 CONTROL_PLANE_MACHINE_TYPE: m5.large NODE_MACHINE_TYPE: m5.large
You’ll notice that several items are blank as they were optional or not relevant to my install (AWS_VPC_ID is only set if you’re re-using an existing VPC, see Requirements for Using an Existing VPC to Provision a Cluster for details). Also, you should review Amazon EC2 Instance Types for details on the different machine types. I also commented out the
BASTION_HOST_ENABLED line as it already existed and was set to
Now we’re ready to kick things off with the
tkg init command, which will create the management cluster. There are loads of options you can specify to fine-tune your management cluster. In this example, we’ll specify a name for the cluster being created and specify the dev plan which will create a single control-plane, single worker node deployment.
tkg init --infrastructure aws --name cjlittle-mgmt-aws --plan dev
You should be able to see a kind (Kubernetes in Docker) container running in Docker. This is the bootstrap Kubernetes cluster and will start the process off of using Cluster API to provision the TKG cluster on AWS.
docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 28648ceb690a gcr.io/kubernetes-development-244305/kind/node:v1.18.6_vmware.1 "/usr/local/bin/entr…" 4 minutes ago Up About a minute 127.0.0.1:39773->6443/tcp tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane
If you need to troubleshoot the early bootstrap stages, you can
docker exec into this container to review logs. You can also use the temporary kubeconfig file to check on the status of resources in the bootstrap cluster.
kubectl --kubeconfig=.kube-tkg/tmp/config_oDL6huqf get po -A NAMESPACE NAME READY STATUS RESTARTS AGE capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-696c55fc88-g4bnv 0/2 Pending 0 5s capi-system capi-controller-manager-64f89c966c-4l8g4 0/2 ContainerCreating 0 32s capi-webhook-system capi-controller-manager-c776dccfb-2srvm 0/2 ContainerCreating 0 43s capi-webhook-system capi-kubeadm-bootstrap-controller-manager-6dd4c8f4f9-c4fqg 0/2 Pending 0 19s capi-webhook-system capi-kubeadm-control-plane-controller-manager-d978674c-4bt5s 0/2 Pending 0 2s cert-manager cert-manager-5cf6d4bbd8-l7drj 1/1 Running 0 5m32s cert-manager cert-manager-cainjector-56c57c56f-ld4vg 1/1 Running 0 5m32s cert-manager cert-manager-webhook-59c765ccdf-nlppd 1/1 Running 0 5m31s kube-system coredns-5cf78cdcc-6jxgh 1/1 Running 0 7m56s kube-system coredns-5cf78cdcc-tcmf4 1/1 Running 0 7m55s kube-system etcd-tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane 1/1 Running 0 8m4s kube-system kindnet-r7rsz 1/1 Running 0 7m56s kube-system kube-apiserver-tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane 1/1 Running 0 8m4s kube-system kube-controller-manager-tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane 1/1 Running 2 8m4s kube-system kube-proxy-bl4m6 1/1 Running 0 7m56s kube-system kube-scheduler-tkg-kind-bsma4uc09c6r9bpmtsrg-control-plane 0/1 Running 2 8m4s local-path-storage local-path-provisioner-bd4bb6b75-cmdfd 1/1 Running 1 7m53s
You should see output similar to the following from the
tkg init command after a few minutes:
Logs of the command execution can also be found at: /tmp/tkg-20200806T183810851437735.log Validating the pre-requisites... Setting up management cluster... Validating configuration... Using infrastructure provider aws:v0.5.4 Generating cluster configuration... Setting up bootstrapper... Bootstrapper created. Kubeconfig: /home/ubuntu/.kube-tkg/tmp/config_oDL6huqf Installing providers on bootstrapper... Fetching providers Installing cert-manager Waiting for cert-manager to be available... Installing Provider="cluster-api" Version="v0.3.6" TargetNamespace="capi-system" Installing Provider="bootstrap-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-bootstrap-system" Installing Provider="control-plane-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-control-plane-system" Installing Provider="infrastructure-aws" Version="v0.5.4" TargetNamespace="capa-system" Start creating management cluster...
At this point, you can review EC2 objects and VPCs created in AWS. Ultimately, you’ll see new Running instances, Elastic IPs, Volumes, Security Groups and Load Balancers, as well as a new VPC.
You can drill down into each to see more details about what has been created.
When the installation of the management cluster is completed, you should see output similar to the following on the command line:
Saving management cluster kuebconfig into /home/ubuntu/.kube/config Installing providers on management cluster... Fetching providers Installing cert-manager Waiting for cert-manager to be available... Installing Provider="cluster-api" Version="v0.3.6" TargetNamespace="capi-system" Installing Provider="bootstrap-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-bootstrap-system" Installing Provider="control-plane-kubeadm" Version="v0.3.6" TargetNamespace="capi-kubeadm-control-plane-system" Installing Provider="infrastructure-aws" Version="v0.5.4" TargetNamespace="capa-system" Waiting for the management cluster to get ready for move... Moving all Cluster API objects from bootstrap cluster to management cluster... Performing move... Discovering Cluster API objects Moving Cluster API objects Clusters=1 Creating objects in the target cluster Deleting objects from the source cluster Context set for management cluster cjlittle-mgmt-aws as 'cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws'. Management cluster created! You can now create your first workload cluster by running the following: tkg create cluster [name] --kubernetes-version=[version] --plan=[plan]
If you run the
kubectl config get-contexts command, you’ll see that you have a new context present.
CURRENT NAME CLUSTER AUTHINFO NAMESPACE cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws cjlittle-mgmt-aws cjlittle-mgmt-aws-admin
You can switch to this context and then examine the objects created.
kubectl config use-context cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws Switched to context "cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws".
kubectl get nodes NAME STATUS ROLES AGE VERSION ip-10-0-0-138.ec2.internal Ready <none> 24m v1.18.6+vmware.1 ip-10-0-0-154.ec2.internal Ready master 25m v1.18.6+vmware.1
kubectl get po -A NAMESPACE NAME READY STATUS RESTARTS AGE capa-system capa-controller-manager-5c4ff75f77-w2w8v 2/2 Running 0 24m capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-696c55fc88-bx9cd 2/2 Running 0 25m capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-779fc7b675-7jknm 2/2 Running 0 25m capi-system capi-controller-manager-64f89c966c-lnzcl 2/2 Running 0 25m capi-webhook-system capa-controller-manager-b6f7487c-gtpkm 2/2 Running 1 24m capi-webhook-system capi-controller-manager-c776dccfb-mmhbm 2/2 Running 0 25m capi-webhook-system capi-kubeadm-bootstrap-controller-manager-6dd4c8f4f9-crv9x 2/2 Running 0 25m capi-webhook-system capi-kubeadm-control-plane-controller-manager-d978674c-qcstg 2/2 Running 0 25m cert-manager cert-manager-5cf6d4bbd8-chqqq 1/1 Running 0 27m cert-manager cert-manager-cainjector-56c57c56f-zlbvn 1/1 Running 0 27m cert-manager cert-manager-webhook-59c765ccdf-76lss 1/1 Running 0 27m kube-system calico-kube-controllers-7d598d6b58-m4b8k 1/1 Running 0 27m kube-system calico-node-q2bzw 1/1 Running 0 26m kube-system calico-node-t9xnq 1/1 Running 0 27m kube-system coredns-5cf78cdcc-5pdcq 1/1 Running 0 27m kube-system coredns-5cf78cdcc-pdzws 1/1 Running 0 27m kube-system etcd-ip-10-0-0-154.ec2.internal 1/1 Running 0 28m kube-system kube-apiserver-ip-10-0-0-154.ec2.internal 1/1 Running 0 28m kube-system kube-controller-manager-ip-10-0-0-154.ec2.internal 1/1 Running 0 28m kube-system kube-proxy-gd7c2 1/1 Running 0 27m kube-system kube-proxy-lkrc6 1/1 Running 0 26m kube-system kube-scheduler-ip-10-0-0-154.ec2.internal 1/1 Running 0 28m
You may find it useful to review the logs from some of these pods if you run into issues with any subsequent operations in your new TKG environment.
tkg get mc command should now display some information about the management cluster.
MANAGEMENT-CLUSTER-NAME CONTEXT-NAME cjlittle-mgmt-aws * cjlittle-mgmt-aws-admin@cjlittle-mgmt-aws
Provision your workload cluster
Now that the management cluster is created, it’s relatively simple to provision workload clusters. We’ll create a small cluster with the same configuration as the management cluster. You can pass different parameters to the
tkg create cluster command or edit the
.tkg/config.yaml file if you want to create a workload cluster with a different configuration from the management cluster.
tkg create cluster cjlittle-test-aws --plan dev
Again, you will see more EC2 resources being configured in AWS as well as a new VPC being created.
When the installation of the workload cluster is completed, you should see output similar to the following on the command line:
Logs of the command execution can also be found at: /tmp/tkg-20200806T230411350608441.log Validating configuration... Creating workload cluster 'cjlittle-test-aws'... Waiting for cluster to be initialized... Waiting for cluster nodes to be available... Workload cluster 'cjlittle-test-aws' created
You’ll need to run the
tkg get credentials command to have a new context created for the workload cluster.
tkg get credentials cjlittle-test-aws Credentials of workload cluster 'cjlittle-test-aws' have been saved You can now access the cluster by running 'kubectl config use-context cjlittle-test-aws-admin@cjlittle-test-aws'
Per the output of the previous command, you can now switch contexts to get access to the workload cluster.
kubectl config use-context cjlittle-test-aws-admin@cjlittle-test-aws Switched to context "cjlittle-test-aws-admin@cjlittle-test-aws".
And you should be able to investigate the objects that were created.
kubectl get nodes NAME STATUS ROLES AGE VERSION ip-10-0-0-78.ec2.internal Ready <none> 6m31s v1.18.6+vmware.1 ip-10-0-0-91.ec2.internal Ready master 8m37s v1.18.6+vmware.1
kubectl get po -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-kube-controllers-7d598d6b58-5mfdn 1/1 Running 0 8m33s kube-system calico-node-42c8t 1/1 Running 0 7m40s kube-system calico-node-5g26g 1/1 Running 0 8m34s kube-system coredns-5cf78cdcc-7r47b 1/1 Running 0 9m31s kube-system coredns-5cf78cdcc-w6zbp 1/1 Running 0 9m31s kube-system etcd-ip-10-0-0-91.ec2.internal 1/1 Running 0 9m44s kube-system kube-apiserver-ip-10-0-0-91.ec2.internal 1/1 Running 0 9m44s kube-system kube-controller-manager-ip-10-0-0-91.ec2.internal 1/1 Running 0 9m44s kube-system kube-proxy-mn47t 1/1 Running 0 7m40s kube-system kube-proxy-znrwg 1/1 Running 0 9m31s kube-system kube-scheduler-ip-10-0-0-91.ec2.internal 1/1 Running 0 9m44s
tkg get cluster command should return information about your workload cluster now.
NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES cjlittle-test-aws default running 1/1 1/1 v1.18.6+vmware.1
Access your TKG nodes
If you need to access the TKG nodes directly, you’ll find that you have to go through the bastion host that was created to get to them.
To access the bastion host, navigate to Running Instances in EC2 and then select the appropriate bastion host. You can identify it by it’s name starting with the cluster name and ending in -bastion. With the bastion host selected, click on the Actions dropdown and then select Connect.
The command in the Example section is about all you need to get to the bastion host as long as you have a copy of the ssh key pair you created saved locally.
ssh -i "cjlittle-tkg.pem" email@example.com Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-1047-aws x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage Get cloud support with Ubuntu Advantage Cloud Guest: http://www.ubuntu.com/business/services/cloud 224 packages can be updated. 151 updates are security updates. New release '18.04.4 LTS' available. Run 'do-release-upgrade' to upgrade to it. *** System restart required *** Last login: Fri Aug 7 05:30:43 2020 from 126.96.36.199 To run a command as administrator (user "root"), use "sudo <command>". See "man sudo_root" for details. ubuntu@ip-10-0-1-9:~$
From here you can ssh to the control plane and worker nodes on their internal IP addresses (with the same ssh key pair) as the ec2-user user. The ssh key pair does not exist on the bastion host so you’ll have to create it manually. You can get the internal IP address of the nodes via
kubectl get nodes -o wide or via their Description page under Running Instances in EC2.
kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-0-138.ec2.internal Ready <none> 51m v1.18.6+vmware.1 10.0.0.138 <none> Amazon Linux 2 4.14.186-146.268.amzn2.x86_64 containerd://1.3.4 ip-10-0-0-154.ec2.internal Ready master 53m v1.18.6+vmware.1 10.0.0.154 <none> Amazon Linux 2 4.14.186-146.268.amzn2.x86_64 containerd://1.3.4
ssh -i "cjlittle-tkg.pem" firstname.lastname@example.org The authenticity of host '10.0.0.154 (10.0.0.154)' can't be established. ECDSA key fingerprint is SHA256:yTm3+EitD6/oANtdxoqL2EwHgedRCsP1bEmL4UnpZO8. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '10.0.0.154' (ECDSA) to the list of known hosts. __| __|_ ) _| ( / Amazon Linux 2 AMI ___|\___|___| https://aws.amazon.com/amazon-linux-2/ [ec2-user@ip-10-0-0-154 ~]$