Kubernetes Monitoring

BindPlane OP supports managing Kubernetes agents allowing you to streamline the observability of your cluster. Before following this guide, be sure to familiarize yourself with the Kubernetes Install, Upgrade, and Uninstall Agents documentation.

Objective

Monitoring a Kubernetes cluster involves collecting metrics and logs from the various components that makeup the cluster.

Metrics

Kubelet

The Kubelet API is hosted on each node within the cluster. It can be used to gather node, pod, container, and volume metrics. Each Kubelet's scope is limited to the node it is running on.

The Kubelet API is useful for tracking pod and container performance metrics, such as CPU or memory utilization.

API Server

The Kubernetes API Server is hosted within the cluster as a Deployment. It can be used to gather higher-level cluster metrics, such as Deployment or Pod phase.

Logs

Container Logs

Kubernetes container logs are written to the node's filesystem. Each Kubernetes node is responsible for hosting these logs. Generally, the logs are written to /var/log/pods and are symlinked in /var/log/containers.

Each log file has the following format:

text

1<pod name>_<namespace name> _<container name>-<container id>.log

The BindPlane agent will extract metadata from the log file name following OpenTelemetry's Semantic Conventions.

Cluster Events

The Kubernetes API server can be used to retrieve Kubernetes Events in the form of logs.

Kubernetes Events are useful for observing issues such as pod crash loop event.

Tracing

Kubernetes does not emit traces, however, applications instrumented to emit OpenTelemetry traces are supported. See the OpenTelemetry section for details.

OpenTelemetry

If your applications are instrumented with OpenTelemetry, they can be configured to forward metrics, traces, and logs to the BindPlane agents.

Implementation

This guide will describe how to configure three configurations:

Kubernetes Node
Kubernetes Cluster
Kubernetes Gateway

The BindPlane Node and Cluster agents will forward their telemetry to the BindPlane Gateway agent(s) using a clusterIP service.

Prerequisites

You should have the following in place before moving forward with BindPlane Kubernetes Agent deployment.

Access to your Kubernetes cluster
Access to your BindPlane OP server

If you do not have BindPlane OP installed, you can follow one of these two guides for deploying BindPlane OP to a Linux server or Kubernetes.

Create Configurations

Before agents can be deployed to the cluster, configurations must be created.

Node Configuration

On the Configurations page, choose "Create Configuration". Create a Kubernetes Node configuration. The node configuration will be deployed as a DaemonSet. The DaemonSet will allow the collection of container logs and Kubelet metrics from each node.

Choose next to view the list of available sources.

Select the Container source and configure it with a cluster name. You can use placeholder value if you intend to detect the cluster name using the resource detection processor. This processor can be configured during the gateway configuration setup.

Select the Kubelet source and configure it with a cluster name. You can use placeholder value if you intend to detect the cluster name using the resource detection processor. This processor can be configured during the gateway configuration setup.

Optionally, select the OpenTelemetry source. The DaemonSet can receive metrics, logs, and traces from applications in your cluster. If you would prefer to have your Gateway Agent handle receiving OpenTelemetry, you can skip this step.

At this point, you should have both Kubernetes sources and the OpenTelemetry (optional) source. Choose next to move to the destination configuration page.

Search for "OpenTelemetry" and select the "OpenTelemetry (OTLP)" destination.

Configure the hostname field with the following value:

text

1bindplane-gateway-agent.bindplane-agent.svc.cluster.local

Leave all other options set to their default values.

Once you have configured the destination, choose "Save".

You will be presented with the new pipeline.

Cluster Configuration

On the Configurations page, choose "Create Configuration". Create a Kubernetes Cluster configuration. The cluster configuration will be deployed as a Deployment with a single pod. The Deployment will allow the collection of cluster metrics and events (logs) from the Kubernetes API server.

Choose next to view the list of available sources.

Select the Kubernetes Cluster source and configure it with a cluster name. You can use placeholder value if you intend to detect the cluster name using the resource detection processor. This processor can be configured during the gateway configuration setup.

Select the Kubernetes Events source and configure it with a cluster name. You can use placeholder value if you intend to detect the cluster name using the resource detection processor. This processor can be configured during the gateway configuration setup.

At this point, you should have both Kubernetes sources. Choose "next" to move to the destination configuration page.

Select the same destination that you created for the node configuration and choose "Save".

Once the configuration is saved, you will be presented with the new pipeline.

Gateway Configuration

On the Configurations page, choose "Create Configuration". Create a Kubernetes Gateway configuration. The gateway configuration will be deployed as a StatefulSet.

note

Deployment with HPA will be supported in the future, as an alternative to StatefulSet.

Choose next to view the list of available sources. Select the OpenTelemetry (OTLP) source. The default values will match the values used by the previously created OpenTelemetry destination. This will allow the Gateway Agent to receive telemetry from the other agents.

After saving the source, choose next to move to the destination configuration page.

In this example, I am going to use the Google Cloud destination. Feel free to choose the destination that best fits your environment. If you do not have a destination at this time, you can use the custom destination and configure the logging exporter. This exporter will act as a "no-op", and allow you to test the configuration without shipping telemetry to a real destination.

Example Google Cloud destination:

If you would like to use the custom destination, enable all three telemetry options and include the following for the configuration block:

yaml

1logging:

Example Logging destination:

Once you have configured the destination, choose "Save".

You will be presented with the new pipeline.

If you would like to detect the Kubernetes Cluster name, you can use the resource detection processor.

note

Cluster name detection is available for Google GKE only. Support for Amazon EKS and Azure AKS is coming soon.

Add a processor to the source side of the pipeline by clicking on the processor icon (It can be found between the source icon and the destination icon).

Choose "Add Processor" and search for "Resource Detection".

Choose "Done" and then "Save".

Deploy Agents

Once the configurations are created, you can move on to deploying agents.

Retrieve YAML Manifests

On the Agents page, select the "Install Agent" button.

Choose the Kubernetes Node platform and the Kubernetes Node configuration you created earlier.

Select "Next" and you will be presented with a yaml text box. Choose "Copy" and save the contents to a file named bindplane-node-agent.yaml

Repeat these steps for the Cluster and Gateway agents. Save their yaml output to files named bindplane-cluster-agent.yaml and bindplane-gateway-agent.yaml.

Kubectl Apply

With all three manifests saved, you can apply them with a single command:

bash

1kubectl apply \
2    -f bindplane-node-agent.yaml \
3    -f bindplane-cluster-agent.yaml \
4    -f bindplane-gateway-agent.yaml

The output will look like this:

kubectl

1namespace/bindplane-agent created
2serviceaccount/bindplane-agent created
3clusterrole.rbac.authorization.k8s.io/bindplane-agent created
4clusterrolebinding.rbac.authorization.k8s.io/bindplane-agent created
5service/bindplane-node-agent created
6service/bindplane-node-agent-headless created
7configmap/bindplane-node-agent-setup created
8daemonset.apps/bindplane-node-agent created
9namespace/bindplane-agent unchanged
10serviceaccount/bindplane-agent unchanged
11clusterrole.rbac.authorization.k8s.io/bindplane-agent unchanged
12clusterrolebinding.rbac.authorization.k8s.io/bindplane-agent unchanged
13deployment.apps/bindplane-cluster-agent created
14namespace/bindplane-agent unchanged
15serviceaccount/bindplane-agent unchanged
16clusterrole.rbac.authorization.k8s.io/bindplane-agent unchanged
17clusterrolebinding.rbac.authorization.k8s.io/bindplane-agent unchanged
18service/bindplane-gateway-agent created
19service/bindplane-gateway-agent-headless created
20statefulset.apps/bindplane-gateway-agent created

The following resources are created

Namespace: bindplane-agent
RBAC
- Service Account: bindplane-agent
- Cluster Role: bindplane-agent
- Cluster Role Binding: bindplane-agent
Node Agent
- clusterIP service: bindplane-node-agent
- clusterIP service (headless): bindplane-node-agent-headless
- DaemonSet: bindplane-node-agent
Cluster Agent
- Deployment: bindplane-cluster-agent
Gateway Agent
- clusterIP service: bindplane-gateway-agent
- clusterIP service (headless): bindplane-gateway-agent-headless
- DaemonSet: bindplane-gateway-agent

kubectl

1NAME                                          READY   STATUS    RESTARTS   AGE
2pod/bindplane-cluster-agent-56dc56b78-fv2nx   1/1     Running   0          5m52s
3pod/bindplane-gateway-agent-0                 1/1     Running   0          5m51s
4pod/bindplane-gateway-agent-1                 1/1     Running   0          5m35s
5pod/bindplane-node-agent-2dbtb                1/1     Running   0          5m52s
6pod/bindplane-node-agent-4m5dl                1/1     Running   0          5m52s
7pod/bindplane-node-agent-7z6bc                1/1     Running   0          5m52s
8pod/bindplane-node-agent-d4xcw                1/1     Running   0          5m52s
9pod/bindplane-node-agent-qfczn                1/1     Running   0          5m52s
10pod/bindplane-node-agent-r75rt                1/1     Running   0          5m52s
11pod/bindplane-node-agent-rjk54                1/1     Running   0          5m52s
12pod/bindplane-node-agent-s94kf                1/1     Running   0          5m52s
13pod/bindplane-node-agent-sxr4g                1/1     Running   0          5m52s
14pod/bindplane-node-agent-x67r8                1/1     Running   0          5m52s
15pod/bindplane-node-agent-xlfxs                1/1     Running   0          5m52s
16pod/bindplane-node-agent-zjthj                1/1     Running   0          5m52s
17
18NAME                                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
19service/bindplane-gateway-agent            ClusterIP   10.112.11.20    <none>        4317/TCP,4318/TCP   5m52s
20service/bindplane-gateway-agent-headless   ClusterIP   None            <none>        4317/TCP,4318/TCP   5m52s
21service/bindplane-node-agent               ClusterIP   10.112.11.166   <none>        4317/TCP,4318/TCP   5m54s
22service/bindplane-node-agent-headless      ClusterIP   None            <none>        4317/TCP,4318/TCP   5m53s
23
24NAME                                  DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
25daemonset.apps/bindplane-node-agent   12        12        12      12           12          <none>          5m53s
26
27NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
28deployment.apps/bindplane-cluster-agent   1/1     1            1           5m53s
29
30NAME                                                DESIRED   CURRENT   READY   AGE
31replicaset.apps/bindplane-cluster-agent-56dc56b78   1         1         1       5m53s
32
33NAME                                       READY   AGE
34statefulset.apps/bindplane-gateway-agent   2/2     5m52s

Once the agents are deployed, they will appear on the Agents page. Agents are named with the following convention:

Node agents take the name of the node they are running on
The Cluster agent takes the name of the underlying pod
Gateway agents take the name of the underlying pod

Initial Configuration Rollout

With the agents connected, you must perform the initial rollout of the configurations.

Navigate to the Configurations page and select your Gateway configuration. Select the "Start Rollout" button. This will push the first version of the configuration to the agents.

Navigate to the Node and Cluster configurations and trigger their initial rollout.

Once the configurations are rolled out, give them ten minutes to start displaying throughput measurements.

Click on an individual agent and select "Recent Telemetry" to view recent logs and metrics.

If the agent does not have recent telemetry, try selecting a different one. If activity in the cluster is low, recent telemetry may not be available on every agent right away.

Security

Each agent manifest has a secret key that is used for authentication to BindPlane OP. If you intend to commit the manifest to git, you should first update the secret key environment variable to use a Kubernetes Secret.

yaml

1spec:
2  template:
3    spec:
4      containers:
5        - name: opentelemetry-container
6          env:
7            - name: OPAMP_SECRET_KEY
8              value: YOUR_SECRET_KEY

You can create a secret and reference it.

bash

1kubectl -n bindplane-agent create secret generic \
2  bindplane-agent \
3  --from-literal=secret-key=YOUR_SECRET_KEY

yaml

1spec:
2  template:
3    spec:
4      containers:
5        - name: opentelemetry-container
6          env:
7            - name: OPAMP_SECRET_KEY
8+             valueFrom:
9+               secretKeyRef:
10+                 name: bindplane-agent
11+                 key: secret-key

Once the secret value is removed from the manifest, it can be safely commited to git.

Troubleshooting

Agents do not appear on the Agents page

If the agent pods are running, but not appearing on the Agents page, make sure your BindPlane server's remote URL parameter is set correctly.

If operating BindPlane on Linux, check the configuration at /etc/bindplane/config.yaml.

yaml

1network:
2  remoteURL: http://bindplane.corp.net:3001

If using Helm to operate BindPlane on Kubernetes, make sure the config.remote_url value is correct. The Helm chart will set this value to the clusterIP in the BindPlane server's namespace, if it is not set explicitly.

Helm

1config:
2  remote_url: ws://bindplane.bindplane.svc.cluster.local:3001

In either case, the remote URL must resolve the BindPlane server and should be reachable by the agents.

If the remote URL appears to be correct, make sure it is correct in the agent manifest.

In this example, the remote URL https://app.bindplane.com:3001 will result in an OpAMP endpoint with value wss://app.bindplane.com:3001/v1/opamp. The OpAMP endpoint is derived from the BindPlane remote URL setting.

yaml

1spec:
2  template:
3    spec:
4      containers:
5        - name: opentelemetry-collector
6          env:
7            - name: OPAMP_ENDPOINT
8              value: wss://app.bindplane.com/v1/opamp:3001

Frequently Asked Questions

Q: Can I modify the manifests?

A: Yes. You may want to adjust the cpu and memory resource request and limits, as well as affinity rules or pod priority class. You should adjust the manifests to fit your environment.

Q: Can BindPlane Agents be installed with Helm?

A: Helm is not supported for agent installation. If you are interested in Helm support, please reach out to us through support channels. We would love to have your feedback.

Q: Can BindPlane Agents be installed with the OpenTelemetry Operator?

A: The OpenTelemetry Operator is not supported for agent installation. The operator only recently added support for OpAMP. We are following the development closely and look forward to supporting the operator in the future.

Q: Is it safe to commit the agent manifests to git?

A: Follow the security section before committing the manifests to git.

Q: Can I use ArgoCD or Flux to manage the agent deployments?

A: Yes. If you have existing tooling in place for managing resources within your cluster, we encourage that you have them handle the agent installation.