In my ongoing efforts to get the most of out of my Tanzu Kubernetes Grid lab environment, I decided to to install Prometheus, Grafana and AlertManager in one of my workload clusters. I had a lot of options to choose from with regards to how to implement these projects but decided to go with Kube-Prometheus based on its use of the Prometheus operator. It was incredibly easy to get an up and running and highly configurable.
I also used Contour to provide ingress to these components…Contour manifest files ship with TKG so it was fairly simple to setup. As with most of my labs, I’m also running MetalLB to provide LoadBalancer functionality, which in this case is used by Envoy (part of Contour) to provide a stable endpoint for my ingresses. You can read more about how to deploy MetalLB in a TKG cluster in my previous blog post, How to Deploy MetalLB with BGP in a Tanzu Kubernetes Grid 1.1 Cluster.
You’ll need to download the TKG extension manifest bundle from https://my.vmware.com/group/vmware/downloads/details?downloadGroup=TKG-113&productId=988&rPId=49705. You can extract the downloaded tgz file and you should see a folder structure that includes tkg-extensions-v1.1.0/ingress/contour/vsphere/
. This is where we’ll be working to deploy Contour.
VMware provides some great documentation around deploying Contour to vSphere and AWS at Implementing Ingress Control on Tanzu Kubernetes Clusters with Contour but for my purposes there were only a handful of commands that needed to be run to get it up and running.
By default, the deployment manifests for Contour that VMware supplies will configure the Envoy pods to use a NodePort service type. While this works, it can be problematic if your nodes are ever recreated or get a different IP address. To avoid this, we’ll change the service type to LoadBalancer and configure a specific IP address from the pool that MetalLB is using.
sed -i 's/NodePort/LoadBalancer/' ~/tkg-extensions-v1.1.0/ingress/contour/vsphere/02-service-envoy.yaml
sed -i '26 i\ loadBalancerIP: 10.40.14.33' ~/tkg-extensions-v1.1.0/ingress/contour/vsphere/02-service-envoy.yaml
With this change in place, we’re ready to deploy the Contour resources to our cluster.
kubectl apply -f ~/tkg-extensions-v1.1.0/ingress/contour/vsphere/
And Contour is now up and running! Of course there are no ingress resources created yet so it’s not doing much but that will change shortly. You can take a look at your cluster and see that there is a new namespace, tanzu-system-ingress, as well as some resources in it.
kubectl -n tanzu-system-ingress get po
NAME READY STATUS RESTARTS AGE
contour-77cbdc9ff9-nvscc 1/1 Running 0 2m14s
contour-77cbdc9ff9-x9zb8 1/1 Running 0 2m16s
envoy-hzgz5 2/2 Running 0 2m15s
The contour pods are part of a deployment/replicaset with two replicas configured and the envoy pod is part of a daemonset so there should be one pod for each worker node in the cluster. The contour pods will watch the Kubernetes API ingress objects (among other things) and provide the envoy pods with the configuration information for creating the reverse-proxy needed for the ingress to function.
With Contour deployed, we can now move onto deploying Prometheus, Grafana and AlertManager.
As noted earlier, I chose to go with the kube-prometheus project so we need to clone that repository locally.
git clone https://github.com/coreos/kube-prometheus/
Once the clone is complete, we should have a kube-promethues/manifests
folder created locally where we’ll make some changes to customize the deployment. In the following sed
commands, the first two are simply changing the number of replicas in the promethues and altetmanager pods from three to one. The rest of the commands are inserting a configuration where Prometheus will use persistent storage (a 1GB volume to be provisioned via vSphere Cloud Native Storage)
sed -i 's/replicas: 3/replicas: 1/' ~/kube-prometheus/manifests/alertmanager-alertmanager.yaml
sed -i 's/replicas: 2/replicas: 1/' ~/kube-prometheus/manifests/prometheus-prometheus.yaml
sed -i '34 i\ storage:' ~/kube-prometheus/manifests/prometheus-prometheus.yaml
sed -i '35 i\ volumeClaimTemplate:' ~/kube-prometheus/manifests/prometheus-prometheus.yaml
sed -i '36 i\ apiVersion: v1' ~/kube-prometheus/manifests/prometheus-prometheus.yaml
sed -i '37 i\ kind: PersistentVolumeClaim' ~/kube-prometheus/manifests/prometheus-prometheus.yaml
sed -i '38 i\ spec:' ~/kube-prometheus/manifests/prometheus-prometheus.yaml
sed -i '39 i\ accessModes:' ~/kube-prometheus/manifests/prometheus-prometheus.yaml
sed -i '40 i\ - ReadWriteOnce' ~/kube-prometheus/manifests/prometheus-prometheus.yaml
sed -i '41 i\ resources:' ~/kube-prometheus/manifests/prometheus-prometheus.yaml
sed -i '42 i\ requests:' ~/kube-prometheus/manifests/prometheus-prometheus.yaml
sed -i '43 i\ storage: 1Gi' ~/kube-prometheus/manifests/prometheus-prometheus.yaml
sed -i '44 i\ storageClassName: k8s-policy' ~/kube-prometheus/manifests/prometheus-prometheus.yaml
Now we can Install the prometheus operator and the rest of the prometheus, grafana and alertmanager components:
kubectl apply -f kube-prometheus/manifests/setup/
kubectl apply -f kube-prometheus/manifests/
You’ll see that there is a new namespace, monitoring
, with a number of resources in in it.
kubectl -n monitoring get po,svc
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 86m
grafana-668c4878fd-tft9w 1/1 Running 0 86m
kube-state-metrics-957fd6c75-q54cn 3/3 Running 0 86m
node-exporter-j7m2x 2/2 Running 0 86m
node-exporter-whtn9 2/2 Running 0 86m
prometheus-adapter-66b855f564-mhxt5 1/1 Running 0 86m
prometheus-k8s-0 3/3 Running 0 86m
prometheus-operator-5b96bb5d85-9tbrn 2/2 Running 0 86m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-main ClusterIP 100.70.67.196 <none> 9093/TCP 86m
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 86m
service/grafana ClusterIP 100.65.32.171 <none> 3000/TCP 86m
service/kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 86m
service/node-exporter ClusterIP None <none> 9100/TCP 86m
service/prometheus-adapter ClusterIP 100.71.103.124 <none> 443/TCP 86m
service/prometheus-k8s ClusterIP 100.64.168.186 <none> 9090/TCP 86m
service/prometheus-operated ClusterIP None <none> 9090/TCP 86m
service/prometheus-operator ClusterIP None <none> 8443/TCP 86m
Strictly speaking, it’s up and running now but we’d not have an easy time getting at anything since all of the services are of type ClusterIP and not accessible externally. To ease this, we’ll use Contour to create an ingress into these services.
The first thing we’ll need to get out of the way is to create some DNS records for the components that will make use of the ingress…Prometheus, Grafana and AlertManager. Since the envoy service is configured to use the load-balanced IP address of 10.40.14.33, we’ll create records for “prometheus”, “grafana” and “alertmanager” in the corp.local domain that all map to the same 10.40.14.33 IP address.
The next step is to create an ingress resource that will route traffic heading to alertmanager.corp.local to the alertmanager-main service on port 9093, route traffic heading to grafana.corp.local to the grafana service on port 3000; route traffic heading to prometheus.corp.lcoal to the prometheus-k8s serivce on port 9090.
echo "
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: monitoring-ingress
namespace: monitoring
spec:
rules:
- host: alertmanager.corp.local
http:
paths:
- backend:
serviceName: alertmanager-main
servicePort: 9093
- host: grafana.corp.local
http:
paths:
- backend:
serviceName: grafana
servicePort: 3000
- host: prometheus.corp.local
http:
paths:
- backend:
serviceName: prometheus-k8s
servicePort: 9090" > monitoring-ingress.yaml
kubectl apply -f monitoring-ingress.yaml
And we can validate that the ingress is created as expected.
kubectl -n monitoring describe ingress monitoring-ingress
Name: monitoring-ingress
Namespace: monitoring
Address:
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
Host Path Backends
---- ---- --------
alertmanager.corp.local
alertmanager-main:9093 (100.121.66.55:9093)
grafana.corp.local
grafana:3000 (100.121.66.14:3000)
prometheus.corp.local
prometheus-k8s:9090 (100.121.66.18:9090)
Annotations: Events: <none>
I thought it would be interesting to see what kind of activity is happening behind the scenes by tailing the contour and envoy pod logs:
kubectl -n tanzu-system-ingress logs envoy-hzgz5 envoy
[2020-08-25 17:03:24.941][1][info][upstream] [source/server/lds_api.cc:73] lds: add/update listener 'ingress_http'
[2020-08-25 17:03:24.944][1][info][upstream] [source/common/upstream/cds_api_impl.cc:74] cds: add 3 cluster(s), remove 2 cluster(s)
[2020-08-25 17:03:24.959][1][info][upstream] [source/common/upstream/cds_api_impl.cc:90] cds: add/update cluster 'monitoring/alertmanager-main/9093/da39a3ee5e'
[2020-08-25 17:03:24.962][1][info][upstream] [source/common/upstream/cds_api_impl.cc:90] cds: add/update cluster 'monitoring/grafana/3000/da39a3ee5e'
[2020-08-25 17:03:24.965][1][info][upstream] [source/common/upstream/cds_api_impl.cc:90] cds: add/update cluster 'monitoring/prometheus-k8s/9090/da39a3ee5e'
kubectl -n tanzu-system-ingress logs --selector=app=contour
time="2020-08-25T17:03:24Z" level=info msg="forcing update" context=contourEventHandler last_update=2m40.645452615s outstanding=1
time="2020-08-25T17:03:24Z" level=info msg="forcing update" context=contourEventHandler last_update=2m40.64400463s outstanding=1
time="2020-08-25T17:03:24Z" level=info msg=stream_wait connection=3 context=grpc node_id=envoy-hzgz5 node_version=b67c14052c49890a7e3afe614d50979c346c024b/1.13.1/Clean/RELEASE/BoringSSL resource_names="[]" response_nonce=3 type_url=type.googleapis.com/envoy.api.v2.Cluster version_info=3
time="2020-08-25T17:03:24Z" level=info msg=stream_wait connection=6 context=grpc node_id=envoy-hzgz5 node_version=b67c14052c49890a7e3afe614d50979c346c024b/1.13.1/Clean/RELEASE/BoringSSL resource_names="[]" response_nonce=3 type_url=type.googleapis.com/envoy.api.v2.Listener version_info=3
time="2020-08-25T17:03:24Z" level=info msg=stream_wait connection=4 context=grpc node_id=envoy-hzgz5 node_version=b67c14052c49890a7e3afe614d50979c346c024b/1.13.1/Clean/RELEASE/BoringSSL resource_names="[ingress_http]" response_nonce=3 type_url=type.googleapis.com/envoy.api.v2.RouteConfiguration version_info=3
time="2020-08-25T17:03:24Z" level=info msg=stream_wait connection=8 context=grpc node_id=envoy-hzgz5 node_version=b67c14052c49890a7e3afe614d50979c346c024b/1.13.1/Clean/RELEASE/BoringSSL resource_names="[monitoring/grafana/http]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.ClusterLoadAssignment version_info=
time="2020-08-25T17:03:25Z" level=info msg=stream_wait connection=9 context=grpc node_id=envoy-hzgz5 node_version=b67c14052c49890a7e3afe614d50979c346c024b/1.13.1/Clean/RELEASE/BoringSSL resource_names="[monitoring/prometheus-k8s/web]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.ClusterLoadAssignment version_info=
time="2020-08-25T17:03:25Z" level=info msg=stream_wait connection=7 context=grpc node_id=envoy-hzgz5 node_version=b67c14052c49890a7e3afe614d50979c346c024b/1.13.1/Clean/RELEASE/BoringSSL resource_names="[monitoring/alertmanager-main/web]" response_nonce= type_url=type.googleapis.com/envoy.api.v2.ClusterLoadAssignment version_info=
time="2020-08-25T17:03:25Z" level=info msg=stream_wait connection=7 context=grpc node_id=envoy-hzgz5 node_version=b67c14052c49890a7e3afe614d50979c346c024b/1.13.1/Clean/RELEASE/BoringSSL resource_names="[monitoring/alertmanager-main/web]" response_nonce=57 type_url=type.googleapis.com/envoy.api.v2.ClusterLoadAssignment version_info=57
time="2020-08-25T17:03:25Z" level=info msg=stream_wait connection=8 context=grpc node_id=envoy-hzgz5 node_version=b67c14052c49890a7e3afe614d50979c346c024b/1.13.1/Clean/RELEASE/BoringSSL resource_names="[monitoring/grafana/http]" response_nonce=57 type_url=type.googleapis.com/envoy.api.v2.ClusterLoadAssignment version_info=57
time="2020-08-25T17:03:25Z" level=info msg=stream_wait connection=9 context=grpc node_id=envoy-hzgz5 node_version=b67c14052c49890a7e3afe614d50979c346c024b/1.13.1/Clean/RELEASE/BoringSSL resource_names="[monitoring/prometheus-k8s/web]" response_nonce=57 type_url=type.googleapis.com/envoy.api.v2.ClusterLoadAssignment version_info=57
With the ingress created, we can now open a browser and navigate to http://prometheus.corp.local, http://grafana.corp.local (admin/admin and you have to change the password immediately), and http://alertmanager.corp.local.
Even with this base configuration in place, we can see that there are metrics being collected by Prometheus, visualizations of these metrics created in Grafana and alerts about these metrics present in AlertManager.
Pingback: Working with TKG Extensions and Shared Services in TKG 1.2 – Little Stuff