We are back with a simplified configuration for another critical open-source component, Zookeeper. Monitoring Zookeeper applications helps to ensure that the data sets are distributed as expected across the cluster. Although Zookeeper is considered to be very resilient to network mishaps, monitoring is inevitable. To do so, we’ll set up monitoring using the Zookeeper receiver from OpenTelemetry.
The configuration detailed in this post uses observIQ’s distribution of the OpenTelemetry collector. We are simplifying the use of OpenTelemetry for all users. If you are as excited as we are, take a look at the details of this support in our repo.
You can utilize this receiver in conjunction with any OTel collector: including the OpenTelemetry Collector and observIQ’s distribution of the collector.
Monitoring performance metrics for Zookeper is necessary to ensure that all the jobs are running as expected and the clusters are humming. The following categories of metrics are monitored using this configuration:
Automatically discover Zookeeper Clusters, monitor memory (heap and non-heap) on the Znode and get alerts of changes in resource consumption. Automatically collect, graph and get alerts on garbage collection iterations, heap size and usage, and threads. ZooKeeper hosts are deployed in a cluster and, as long as a majority of hosts are up, the service will be available. Make sure the total node count inside the ZooKeeper tree is consistent.
Latency and throughput:
Consistent view of the performance of your servers, regardless of whether they change roles from Followers to Leader or back – you’ll get a meaningful view of the history.
Configuring the Zookeeper Receiver
After the installation, the config file for the collector can be found at:
- C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows)
- Configure the collection_interval attribute. It is set to 60 seconds in this sample configuration.
- Setup the endpoint attribute as the system that is running the Hadoop instance
receivers: zookeeper: collection_interval: 30s endpoint: localhost:2181
- The resource detection processor is used to create a distinction between metrics received from multiple Hadoop systems. This helps with filtering metrics from specific Redis hosts in the monitoring tool, in this case, Google Cloud operations.
- Add the batch processor to bundle the metrics from multiple receivers. We highly recommend using this processor in the configuration, especially for the benefit of the logging component of the collector. To learn more about this processor check the documentation.
processors: resourcedetection: detectors: ["system"] system: hostname_sources: ["os"] batch:
In this example, the metrics are exported to New Relic using the OTLP exporter. If you would like to forward your metrics to a different destination, check the destinations that OpenTelemetry supports at this time, here.
exporters: otlp: endpoint: https://otlp.nr-data.net:443 headers: api-key: 00000-00000-00000 tls: insecure: false
Set up the pipeline:
service: pipelines: metrics: receivers: - zookeeper processors: - resourcedetection - batch exporters: - otlp
Viewing the metrics
All the metrics the Zookeeper receiver scrapes are listed below.
|zookeeper.connection.active||The number of active connections.
|zookeeper.data_tree…hemeral_node.count||The number of ephemeral nodes.
|zookeeper.data_tree.size||The size of the data tree.
|zookeeper.file_descriptor.limit||The limit set for the file descriptor.
|zookeeper.file_descriptor.open||The number of open file descriptors
|zookeeper.latency.max||The maximum latency
|zookeeper.latency.min||The minimum latency set.
|zookeeper.packet.count||The packet count
|zookeeper.request.active||The number of active requests
|zookeeper.watch.count||The watch count
| zookeeper.znode.count||The total number of znode.
Now that you have the metrics gathered and exported to the destination of your choice, you may want to explore how to effectively configure alerts for these metrics. Here are some alerting possibilities for ZooKeeper:
|ZooKeeper server is down||critical
|Too many znodes created||warning|
|Too many connections created||warning
|Memory occupied by znode is too large||warning
|Set too many watch||warning
|Too many files open||warning|
|Average latency is too high||warning|
|JVM memory almost full||warning|
observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com or join the conversation on Slack!