Technical “How-To’s”

How to Monitor Zookeeper with OpenTelemetry

Deepa Ramachandra
Deepa Ramachandra
Share:

We are back with a simplified configuration for another critical open-source component, Zookeeper. Monitoring Zookeeper applications helps to ensure that the data sets are distributed as expected across the cluster. Although Zookeeper is considered very resilient to network mishaps, monitoring is inevitable. To do so, we’ll set up monitoring using the Zookeeper receiver from OpenTelemetry.

The configuration detailed in this post uses observIQ’s distribution of the OpenTelemetry collector. We are simplifying the use of OpenTelemetry for all users. If you are as excited as we are, look at the details of this support in our repo.

You can utilize this receiver in conjunction with any OTel Collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector.

Monitoring performance metrics for Zookeeper is necessary to ensure that all the jobs are running as expected and the clusters are humming. The following categories of metrics are monitored using this configuration:

Znodes:

Automatically discover Zookeeper Clusters, monitor memory (heap and non-heap) on the Znode, and get alerts of changes in resource consumption. Automatically collect, graph, and get alerts on garbage collection iterations, heap size and usage, and threads. ZooKeeper hosts are deployed in a cluster, and as long as most hosts are up, the service will be available. Make sure the total node count inside the ZooKeeper tree is consistent.

Latency and throughput:

A consistent view of the performance of your servers, regardless of whether they change roles from Followers to Leader or back – you’ll get a meaningful view of the history.

Configuring the Zookeeper Receiver

After the installation, the config file for the collector can be found at:

  • C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows)
  • /opt/observiq-otel-collector/config.yaml(Linux)

Related Content: How to Install and Configure an OpenTelemetry Collector

Receiver Configuration:

  1. Configure the collection_interval attribute. It is set to 60 seconds in this sample configuration.
  2. Setup the endpoint attribute as the system that is running the Hadoop instance
yaml
1receivers:
2  zookeeper:
3    collection_interval: 30s
4    endpoint: localhost:2181

Processor Configuration:

  1. The resource detection processor is used to create a distinction between metrics received from multiple Hadoop systems. This helps filter metrics from specific Redis hosts in the monitoring tool, such as Google Cloud operations.
  2. Add the batch processor to bundle the metrics from multiple receivers. We highly recommend using this processor in the configuration, especially for the benefit of the collector's logging component. If you would like to learn more about this processor check the documentation.
yaml
1processors:
2  resourcedetection:
3    detectors: ["system"]
4    system:
5      hostname_sources: ["os"]
6
7  batch:

Exporter Configuration:

In this example, the metrics are exported to New Relic using the OTLP exporter. If you want to forward your metrics to a different destination, you can check the destinations OpenTelemetry supports here.

yaml
1exporters:
2  otlp:
3    endpoint: https://otlp.nr-data.net:443
4    headers:
5      api-key: 00000-00000-00000
6    tls:
7      insecure: false

Set up the pipeline:

yaml
1service:
2  pipelines:
3    metrics:
4      receivers:
5      - zookeeper
6      processors:
7      - resourcedetection
8      - batch
9      exporters:
10      - otlp

Viewing the metrics

All the metrics the Zookeeper receiver scrapes are listed below.

MetricDescription
zookeeper.connection.activeThe number of active connections.
zookeeper.data_tree…hemeral_node.countThe number of ephemeral nodes.
zookeeper.data_tree.sizeThe size of the data tree.
zookeeper.file_descriptor.limitThe limit set for the file descriptor.
zookeeper.file_descriptor.openThe number of open file descriptors
zookeeper.latency.maxThe maximum latency
zookeeper.latency.minThe minimum latency set.
zookeeper.packet.countThe packet count
zookeeper.request.activeThe number of active requests
zookeeper.watch.countThe watch count
zookeeper.znode.countThe total number of znode.

Related Content: Managing Observability Pipeline Chaos and the Bottomline

Alerting

Now that you have the metrics gathered and exported to the destination of your choice, you can explore how to configure alerts for these metrics effectively. Here are some alerting possibilities for ZooKeeper:

AlertSeverity
ZooKeeper server is downcritical
Too many znodes createdwarning
Too many connections createdwarning
Memory occupied by znode is too largewarning
Set too many watchwarning
Too many files openwarning
Average latency is too highwarning
JVM memory almost fullwarning

observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com or join the conversation on Slack!

Deepa Ramachandra
Deepa Ramachandra
Share:

Related posts

All posts

Get our latest content
in your inbox every week

By subscribing to our Newsletter, you agreed to our Privacy Notice

Community Engagement

Join the Community

Become a part of our thriving community, where you can connect with like-minded individuals, collaborate on projects, and grow together.

Ready to Get Started

Deploy in under 20 minutes with our one line installation script and start configuring your pipelines.

Try it now