Enabling Rancher K3s cluster control plane monitoring with Prometheus

Monitor it all with the Prometheus stack

As you might expect, Prometheus, Grafana and AlertManager are the most popular tools to monitor Kubernetes clusters.

With Prometheus metrics you can monitor CPU, memory and network usage for pods or Kubernetes nodes, but also Kubernetes control plane components such as controller manager, scheduler and etcd. Add on Grafana for observability dashboards and AlertManager for, well, alerts, and you have a complete stack.

(We’ve made it easy for you as a Palette user to deploy this stack to your clusters using a prebuilt “Prometheus-Grafana” pack, as we discussed in this blog.)

Now here’s the problem.

K3s doesn’t expose all metrics: what’s the solution?

If you’re using kubeadm-based Kubernetes or Rancher RKE2 clusters, metrics to monitor Kubernetes control plane components like the controller manager, scheduler, and etcd are enabled and exposed by default. With default settings, Prometheus can collect these metrics.

But many organizations choose to use Rancher’s lightweight, edge-optimized open source K8s distribution, K3s. And K3s does not expose all metrics by default: only kubelet and Kubernetes API service metrics are collected and available. Metrics for Kubernetes controller manager, scheduler and etcd are not exposed and therefore not available in Prometheus and Grafana.

Unless you enjoy flying blind in your production environments, you’ll need a way to enable control plane metrics collection for a K3s cluster. This blog shows you how to do just that.

But wait, there’s more

So we know that, by default, K3s only exposes kubelet and Kubernetes API server metrics. Other control plane metrics (controller manager, scheduler and etcd) are not exposed.

But there’s a second problem we need to fix, too. The K3s control plane is a binary, not a set of Kubernetes pods. So even after we enable the metrics, Prometheus cannot automatically discover endpoints where these metrics are available. These metrics endpoints must be manually configured for Prometheus during deployment.

Let’s take a tour of how we can address these issues with Palette, step by step.

Step 1: Access the exposed metrics

Kubelet and Kubernetes API service metrics are available with the default K3s and Prometheus configuration. You can access them from the Prometheus UI or from your Grafana dashboards.

For Kubelet monitoring, open the dashboard “Dashboards > Kubernetes / Kubelet”:

Screenshot showing Kubernetes API service metrics

For the Kubernetes API service, ensure you have installed the “Spectro Dashboards” pack in addition to the Prometheus-Grafana” pack (see the example of the Cluster Profile for this in the section below). Then you can open the dashboard “Dashboards > Spectro Cloud > Kubernetes / System / API Server”:

Step 2: Enable K3s control plane metrics

To enable control plane metrics for the Rancher K3s distribution, head into the Palette UI and in the Cluster Profile, look for the K3s infrastructure layer and change the following in “cluster.config” :

To enable Kubernetes Controller Manager metrics, add “- bind-address=0.0.0.0” argument to the “kube-controller-manager-arg” list
To enable Kubernetes Scheduler metrics, add “- bind-address=0.0.0.0” argument to the “kube-scheduler-arg” list
To enable etcd metrics, add “etcd-expose-metrics: true” option for “cluster.config”

See the screenshot below with the outlined changes:

‍

Screenshot showing how to enable k3s control plane metrics

Once you apply this Cluster Profile to a K3s cluster, it should enable K3s to expose the control plane metrics on the node’s IP. The metrics will be available:

For Controller Manager: on node’s IP, TCP port 10257
For Scheduler: on node’s IP, TCP port 10259
For etcd: on node’s IP, TCP port 2381

Step 3: Configure Prometheus to collect metrics from the K3s control plane

By default, Prometheus is configured to collect metrics from Kubernetes clusters where the control plane runs as a set of pods. It discovers control plane endpoints using labels associated with these pods.

However, this approach does not work for K3s, because K3s operates as a binary rather than using pods for the control plane. Instead, Prometheus needs to be configured to directly access the control plane endpoints, which are the IPs of the control plane nodes.

In the Prometheus-Grafana pack, you can configure a list of endpoints for Kubernetes Scheduler, Controller Manager and etcd. Since different clusters have different control plane node IPs, Prometheus configurations must be defined per cluster when users apply the Profile.

Create a Cluster Profile (of type add-on) and add Prometheus-Grafana and Spectro Grafana Dashboards packs:

Kubernetes monitoring with Palette cluster profiles

At this point, there’s no need to change any values in the Profile, but you can modify them if you want.

Select the cluster where you want to deploy Prometheus and enable metrics collection and monitoring.

Apply the Profile you created previously, but set the endpoint values in Prometheus-Grafana layer for kubeControllerManager (line 1233), kubeEtcd (line 1450) and kubeScheduler (line 1532) before applying this Profile to the K3s cluster.

The endpoints list must contain all the IPs of the K3s cluster master nodes. In our example below, the cluster has only one master node, so there’s only one IP address.

Step 4: Verify!

After you’ve applied the Cluster Profile to the cluster, you can open the Grafana UI and check if the metrics are being collected and made available.

To check Kubernetes Controller Manager metrics, open the dashboard “Dashboards > Kubernetes / Controller Manager”:

To check Kubernetes Scheduler metrics, open the dashboard “Dashboards > Kubernetes / Scheduler”:

Grafana UI, Kubernetes scheduler metrics

By default, there is no Grafana dashboard to show etcd metrics, so you can check them from the Prometheus UI. Open the Prometheus UI and search for etcd metrics:

Some etcd dashboards are available on Grafana Hub, for example dashboard with ID=3070. Import this dashboard to Grafana by providing this ID and check for etcd metrics:

Bonus: enable Prometheus to collect metrics from K3s control plane with overlay enabled

When you’re using your K3s clusters with an overlay network enabled, the master node IP addresses can change at any time.

To ensure continuous metrics collection in such cases, use the IPs from the overlay CIDR range in the "endpoints" lists of the Prometheus-Grafana pack, instead of the actual edge host IPs.

If the default overlay CIDR range is being used (100.64.192.0/23), then IPs 100.64.192.2, 100.64.192.3 and 100.64.192.4 correspond to the control plane nodes. These IPs should be listed in endpoints in Prometheus-Grafana pack:

Collecting Prometheus metrics from K3s control plane

In most cases, these IPs are the same for K3s clusters, and you can define them in the Cluster Profile; they don’t need to be changed when users apply Profiles to their clusters.

Next steps

So there you have it: a step by step guide to enabling Prometheus metrics gathering for your K3s control plane, using Palette.

What’s next?

If you’re not totally happy with K3s, you might want to check out our guide to edge-optimized OS/K8s platforms.

To learn more about monitoring and observability in Kubernetes, check out our webinar.

And if you’re intrigued by how Palette’s packs simplify installing Prometheus and other K8s software at scale, you might want to set up a quick demo of Palette and try it for yourself.

Tags:

Networking

Observability

Operations

How to