How to use Tanzu Observability by Wavefront in a TMC-managed Kubernetes cluster

If you haven’t tried out Tanzu Observability by Wavefront (in TMC or otherwise), I can’t recommend it enough. I’m relatively new to it but am still finding myself amazed with the depth of understanding it can bring to any kind of data that you send to it. TMC recently introduced an integration with Tanzu Observability and it’s super easy to get up and running. I’ll walk through the process here and dive into some of what’s happening behind the scenes. You can read more about this topic in the TMC documentation at Managing Integrations.

Before you can use Tanzu Observability on any of your TMC-managed clusters, you must have access to Tanzu Observability by Wavefront. If you don’t already have access, you can sign up for a free trial at https://www.wavefront.com/sign-up/.

Once logged in to Tanzu Observability, you’ll need to get your API URL and token. These can be obtained by clicking on the gear icon at the top right and then clicking on your user name.

Click on the API Access tab and make note of the URL and token values.

In this example, the URL is https://vmware.wavefront.com/api/ (we remove the v2/source from the end) and the token is cf391585-ca50-4ca3-a681-8e539dddd59c.

Now we’re ready to enable Tanzu Observability for our CSP organization. This affects the org as a whole but still needs to be enabled for individual clusters as well.

Navigate to the Administration side-menu and then select the Integrations tab.

As of this writing, the only integration available is Tanzu Observability by Wavefront. Click the Enable button here.

Click the Confirm button to complete the enablement process. You’ll see that the status is now Enabled on the Integrations page.

With Tanzu Observability enabled for the org, you can now enable it for your desired cluster. Navigate to the Clusters side-menu and then click on the desired cluster. From the Actions drop-down menu, click on Tanzu Observability by Wavefront and then click on Add.

Enter the API URL and token values noted earlier and click the Confirm button.

There should be a new extension listed named wavefront-extension and it’s health will be unknown for a few minutes while things are getting setup in your cluster.

Examining the cluster in question, we can see that there is a new namespace named tanzu-observability-saas with a number of pods running in it.

kubectl --kubeconfig=kubeconfig-test-obs.yml -n tanzu-observability-saas get po

NAME                                     READY   STATUS    RESTARTS   AGE
wavefront-collector-22twf                1/1     Running   0          8m13s
wavefront-collector-5825r                1/1     Running   0          8m13s
wavefront-collector-5hnrq                1/1     Running   0          8m13s
wavefront-collector-9r49b                1/1     Running   0          8m13s
wavefront-collector-fwvmk                1/1     Running   0          8m13s
wavefront-collector-k8nql                1/1     Running   0          8m13s
wavefront-collector-s7vxk                1/1     Running   0          8m13s
wavefront-collector-z84cg                1/1     Running   0          8m13s
wavefront-proxy-tanzu-7d4559df7f-2zt4p   1/1     Running   0          25s

There is a collector for each node (part of a daemonset) and the wavefront-proxy-tanzu pod which is responsible for shipping data to the Wavefront API URL. If we look at the logs in this pod, we can see that it has successfully established a connection and is sending data:


kubectl --kubeconfig=kubeconfig-test-obs.yml -n tanzu-observability-saas logs wavefront-proxy-tanzu-7d4559df7f-2zt4p

2020-08-24 16:54:44,724 INFO  [proxy:parseArguments] Wavefront Proxy version 8.2, runtime: OpenJDK Runtime Environment (Ubuntu) 11.0.7
2020-08-24 16:54:44,733 INFO  [proxy:parseArguments] Arguments: -h https://vmware.wavefront.com/api/ -t <HIDDEN> --hostname wavefront-proxy-tanzu-7d4559df7f-6kwll --ephemeral true --buffer /var/spool/wavefront-proxy/buffer --flushThreads 6
2020-08-24 16:54:44,766 INFO  [proxy:getOrCreateProxyId] Ephemeral proxy id created: 38c6b58d-1cc2-4d61-a3df-a7396c2f0ebe
2020-08-24 16:54:45,412 INFO  [proxy:checkin] Checking in: https://vmware.wavefront.com/api/
2020-08-24 16:54:46,472 INFO  [proxy:<init>] initial configuration is available, setting up proxy
2020-08-24 16:54:46,475 INFO  [proxy:processConfiguration] Proxy trace span sampling rate set to 0.03
2020-08-24 16:54:46,475 INFO  [proxy:scheduleCheckins] scheduling regular check-ins
2020-08-24 16:54:46,652 INFO  [proxy:lambda$startListeners$7] listening on port: 2878 for Wavefront metrics
2020-08-24 16:54:51,661 INFO  [proxy:run] setup complete

For comparison, the following is a log snippet from an instance where the API URL was misconfigured (it still had “v2” at the end): You can see that the proxy detected the misconfiguration and the pod eventually went into a CrashLoopBackoff state. The status of the wavefront-extension in the TMC UI was red in this scenario.


kubectl --kubeconfig=kubeconfig-test-obs.yml -n tanzu-observability-saas logs wavefront-proxy-tanzu-7d4559df7f-wj89h

2020-08-24 16:34:44,639 INFO  [proxy:parseArguments] Wavefront Proxy version 8.2, runtime: OpenJDK Runtime Environment (Ubuntu) 11.0.7
2020-08-24 16:34:44,647 INFO  [proxy:parseArguments] Arguments: -h https://vmware.wavefront.com/api/v2 -t <HIDDEN> --hostname wavefront-proxy-tanzu-7d4559df7f-wj89h --ephemeral true --buffer /var/spool/wavefront-proxy/buffer --flushThreads 6
2020-08-24 16:34:44,684 INFO  [proxy:getOrCreateProxyId] Ephemeral proxy id created: d0cba3a1-dad0-458c-bb33-8f3bd5764df1
2020-08-24 16:34:45,229 INFO  [proxy:checkin] Checking in: https://vmware.wavefront.com/api/v2
2020-08-24 16:34:45,751 ERROR [proxy:checkinError] **************************************************************************************************************
2020-08-24 16:34:45,751 ERROR [proxy:checkinError] Possible server endpoint misconfiguration detected, attempting to use https://vmware.wavefront.com/api/v2/api/
2020-08-24 16:34:45,751 ERROR [proxy:checkinError] **************************************************************************************************************
2020-08-24 16:34:45,766 INFO  [proxy:checkin] Checking in: https://vmware.wavefront.com/api/v2/api/
2020-08-24 16:34:45,840 ERROR [proxy:checkinError] ********************************************************************************************************************************************************************************************
2020-08-24 16:34:45,841 ERROR [proxy:checkinError] HTTP 404: Misconfiguration detected, please verify that your server setting is correct. Server endpoint URLs normally end with '/api/'. Current setting: https://vmware.wavefront.com/api/v2
2020-08-24 16:34:45,841 ERROR [proxy:checkinError] ********************************************************************************************************************************************************************************************

If we examine the logs from one of the collector pods, we can identify any issues with collecting the configured metrics or with pushing them to the proxy.

kubectl --kubeconfig=kubeconfig-test-obs.yml -n tanzu-observability-saas logs wavefront-collector-22twf -f

time="2020-08-24T18:21:18Z" level=info msg="/wavefront-collector --daemon=true --config-file=/etc/collector/collector.yaml"
time="2020-08-24T18:21:18Z" level=info msg="wavefront-collector version 1.2.1"
time="2020-08-24T18:21:18Z" level=info msg="POD_NODE_NAME: ip-10-0-5-6.ec2.internal"
time="2020-08-24T18:21:18Z" level=info msg="loading config: /etc/collector/collector.yaml"
time="2020-08-24T18:21:18Z" level=info msg="using configuration file, omitting flags"
time="2020-08-24T18:21:18Z" level=info msg="Using Kubernetes client with master \"https://kubernetes.default.svc\" and version v1\n"
time="2020-08-24T18:21:18Z" level=info msg="Using kubelet port 10250"
I0824 18:21:18.788473       1 leaderelection.go:235] attempting to acquire leader lease  tanzu-observability-saas/wf-collector-leader...
time="2020-08-24T18:21:18Z" level=info msg="Adding provider" collection_interval=0s name=kubernetes_summary_provider timeout=30s
time="2020-08-24T18:21:18Z" level=info msg="Using default collection interval" collection_interval=1m0s provider=kubernetes_summary_provider
time="2020-08-24T18:21:18Z" level=info msg="Adding provider" collection_interval=0s name=internal_stats_provider timeout=30s
time="2020-08-24T18:21:18Z" level=info msg="Using default collection interval" collection_interval=1m0s provider=internal_stats_provider
time="2020-08-24T18:21:18Z" level=info msg="Adding provider" collection_interval=0s name=kstate_metrics_provider timeout=30s
time="2020-08-24T18:21:18Z" level=info msg="Using default collection interval" collection_interval=1m0s provider=kstate_metrics_provider
time="2020-08-24T18:21:18Z" level=info msg="Adding provider" collection_interval=0s name="telegraf_provider: [mem net netstat linux_sysctl_fs swap cpu disk diskio system kernel processes]" timeout=30s
time="2020-08-24T18:21:18Z" level=info msg="Using default collection interval" collection_interval=1m0s provider="telegraf_provider: [mem net netstat linux_sysctl_fs swap cpu disk diskio system kernel processes]"
time="2020-08-24T18:21:18Z" level=info msg="using clusterName: test-obs.global.tmc"
time="2020-08-24T18:21:18Z" level=info msg="Starting with wavefront_sink"
time="2020-08-24T18:21:18Z" level=info msg="Events collection disabled" system=events
time="2020-08-24T18:21:18Z" level=info msg="Starting agent"
time="2020-08-24T18:21:18Z" level=info msg="Starting discovery manager"
time="2020-08-24T18:21:18Z" level=info msg="runtime plugins enabled"
time="2020-08-24T18:21:18Z" level=info msg="no runtime annotation on wavefront-collector-config"
time="2020-08-24T18:21:18Z" level=info msg="no runtime annotation on wavefront-proxy-config"
time="2020-08-24T18:21:18Z" level=info msg="no runtime annotation on wf-collector-leader"
time="2020-08-24T18:21:18Z" level=info msg="node: ip-10-0-3-225.ec2.internal elected leader"
time="2020-08-24T18:21:18Z" level=info msg="discovery config interval: 5m0s"
time="2020-08-24T18:21:18Z" level=info msg="checking for runtime plugin changes"
time="2020-08-24T18:21:47Z" level=info msg="no runtime annotation on wf-collector-leader"
time="2020-08-24T18:22:17Z" level=info msg="no runtime annotation on wf-collector-leader"
time="2020-08-24T18:22:18Z" level=info msg="not scraping sources from: kstate_metrics_provider. current leader: ip-10-0-3-225.ec2.internal"
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name=internal_stats_source
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name="kubelet_summary:10.0.5.6:10250"
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name=telegraf_mem_plugin_source
I0824 18:22:18.793754       1 log.go:172] connected to Wavefront proxy at address: wavefront-proxy-tanzu.tanzu-observability-saas:2878
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name=telegraf_net_plugin_source
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name=telegraf_netstat_plugin_source
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name=telegraf_linux_sysctl_fs_plugin_source
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name=telegraf_swap_plugin_source
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name=telegraf_cpu_plugin_source
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name=telegraf_disk_plugin_source
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name=telegraf_diskio_plugin_source
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name=telegraf_system_plugin_source
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name=telegraf_kernel_plugin_source
time="2020-08-24T18:22:18Z" level=info msg="Querying source" name=telegraf_processes_plugin_source
time="2020-08-24T18:22:47Z" level=info msg="no runtime annotation on wf-collector-leader"
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink
time="2020-08-24T18:22:48Z" level=info msg="Data push complete" name=wavefront_sink

There are also a number of configmaps created in the new namespace.

kubectl --kubeconfig=kubeconfig-test-obs.yml -n tanzu-observability-saas get cm

NAME                         DATA   AGE
wavefront-collector-config   1      11m
wavefront-proxy-config       1      11m
wf-collector-leader          0      11m

The wavefront-collector-config configmap determines what data is going to be shipped:

kubectl --kubeconfig=kubeconfig-test-obs.yml -n tanzu-observability-saas describe cm wavefront-collector-config

Name:         wavefront-collector-config
Namespace:    tanzu-observability-saas
Labels:       <none>
Annotations:  <none>

Data
====
collector.yaml:
----
clusterName: test-obs.global.tmc
enableEvents: false
enableDiscovery: true
flushInterval: 30s

sinks:
- proxyAddress: wavefront-proxy-tanzu.tanzu-observability-saas:2878

  filters:
    # Filter out generated labels
    tagExclude:
    - 'label?controller?revision*'
    - 'label?pod?template*'
    - 'annotation_kubectl_kubernetes_io_last_applied_configuration'

sources:
  kubernetes_source:
    url: 'https://kubernetes.default.svc'
    kubeletPort: 10250
    kubeletHttps: true
    useServiceAccount: true
    insecure: true
    prefix: 'kubernetes.'

    filters:
      metricBlacklist:
      - 'kubernetes.sys_container.*'

  internal_stats_source:
    prefix: 'kubernetes.'

  kubernetes_state_source:
    prefix: 'kubernetes.'

  telegraf_sources:
  # enable all telegraf plugins
  - plugins: []

# discovery rules for auto-discovery of pods and services
discovery:
  enable_runtime_plugins: true
  discovery_interval: 5m
  plugins:
  - name: kube-dns-discovery
    type: prometheus
    selectors:
      images:
      - '*kube-dns/sidecar*'
      labels:
        k8s-app:
        - kube-dns
    port: 10054
    path: /metrics
    scheme: http
    prefix: kube.dns.
    filters:
      metricWhitelist:
      - 'kube.dns.http.request.duration.microseconds'
      - 'kube.dns.http.request.size.bytes'
      - 'kube.dns.http.requests.total.counter'
      - 'kube.dns.http.response.size.bytes'
      - 'kube.dns.kubedns.dnsmasq.*'
      - 'kube.dns.process.*'

  # auto-discover coredns
  - name: coredns-discovery
    type: prometheus
    selectors:
      images:
      - '*coredns:*'
      labels:
        k8s-app:
        - kube-dns
    port: 9153
    path: /metrics
    scheme: http
    prefix: kube.coredns.
    filters:
      metricWhitelist:
      - 'kube.coredns.coredns.cache.*'
      - 'kube.coredns.coredns.dns.request.count.total.counter'
      - 'kube.coredns.coredns.dns.request.duration.seconds'
      - 'kube.coredns.coredns.dns.request.size.bytes'
      - 'kube.coredns.coredns.dns.request.type.count.total.counter'
      - 'kube.coredns.coredns.dns.response.rcode.count.total.counter'
      - 'kube.coredns.coredns.dns.response.size.bytes'
      - 'kube.coredns.process.*'

Events:  <none>

While the wavefront-proxy-config configmap stores the API URL that we configured in the TMC UI.

kubectl --kubeconfig=kubeconfig-test-obs.yml -n tanzu-observability-saas describe cm wavefront-proxy-config

Name:         wavefront-proxy-config
Namespace:    tanzu-observability-saas
Labels:       <none>
Annotations:  <none>

Data
====
wavefront.url:
----
https://vmware.wavefront.com/api/
Events:  <none>

There are also a few secrets that exist in the new namespace.

kubectl --kubeconfig=kubeconfig-test-obs.yml -n tanzu-observability-saas get secrets

NAME                              TYPE                                  DATA   AGE
default-token-54tqr               kubernetes.io/service-account-token   3      11m
wavefront-collector-token-x6v44   kubernetes.io/service-account-token   3      11m
wavefront-proxy-secret            Opaque                                1      11m

And a closer examination of the wavefront-proxy-secret secret shows that it contains the token value that was entered in the TMC UI.

kubectl --kubeconfig=kubeconfig-test-obs.yml -n tanzu-observability-saas get secrets wavefront-proxy-secret -o 'go-template={{ index .data "wavefront.token" }}' | base64 -d

cf391585-ca50-4ca3-a681-8e539dddd59c

Back in the TMC UI, we can see that the  health of the wavefront-extension is now green and that we have an option under the Actions → Tanzu Observability by Wavefront menu named Open in Tanzu Observabililty by Wavefront.

If we click on this new menu item we  are taken to our Tanzu Observability web page and we can see that there are loads of data being aggregated via Tanzu Observability.

This is just what is available out-of-the-box and just what is on the surface. You can create your own dashboards here or drill-down into the pre-configured ones to get a much more thorough understanding of what’s going on in your Kubernetes clusters. 

Leave a Comment

Your email address will not be published. Required fields are marked *