Technical “How-To’s”

How to Monitor Cassandra using OpenTelemetry

Deepa Ramachandra
Deepa Ramachandra
Share:

We are constantly working on contributing monitoring support for various sources; the latest in that line is support for Cassandra monitoring using the OpenTelemetry collector. If you are as excited as we are, take a look at the details of this support in OpenTelemetry’s repo.

The best part is that this receiver works with any OpenTelemetry Collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector.

In this post, we take you through the steps to set up the JMX receiver with observIQ’s distribution of the OpenTelemetry Collector and send out the metrics to New Relic.

What signals matter?

Performance metrics are the most important to monitor for Cassandra. Here’s a list of signals to keep track of:

Availability of resources:

Monitoring the physical resources and their utilization is critical to Cassandra’s performance. Standard JVM metrics, such as memory usage, thread count, garbage collection, etc., are good to monitor. If there’s a decrease in the computing resources, the Cassandra database’s performance will be affected.

Volume of client requests:

As with monitoring other databases, monitoring the time taken to send, receive, and fulfill requests is necessary. The volume of requests is also an indicator of unforeseen spikes in traffic, possibly an issue with the application/ database.

Latency:

Latency is a critical metric to monitor for Cassandra databases. Continuous monitoring helps identify performance issues and latency issues originating from a cluster. Values of read and write requests are monitored to create a holistic view of execution speed.

Related Content: How to Install and Configure an OpenTelemetry Collector

Configuring the JMX metrics receiver

After the installation, the config file for the collector can be found at:

  • C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows)
  • /opt/observiq-otel-collector/config.yaml(Linux)

The first step is building the receiver’s configuration:

  • We are using the JMX receiver to gather Cassandra metrics. The jar_path attribute lets you specify the path to the jar file, which facilitates gathering Cassandra metrics using the JMX receiver. This file path is created automatically when observIQ’s distribution of the OpenTelemetry Collector is installed.
  • Set the IP address and port for the system from which the metrics are gathered as the endpoint.
  • When we connect to JMX, there are different categories of metrics; the Cassandra metrics and JVM metrics are the ones that this configuration intends to scrape. This target_system attribute specifies that.
  • Set the time interval for fetching the metrics for the collection_interval attribute. The default value for this parameter is 10s. However, if exporting metrics to New Relic, this value is set to 60s by default.
  • The Properties attribute allows you to set arbitrary attributes. For instance, if you are configuring multiple JMX receivers to collect metrics from many Cassandra servers, this attribute will enable you to set the unique IP addresses for each endpoint system. So that you know, this is not the only use of the properties option.

Related Content: Configuration Management in BindPlane OP

yaml
1receivers:
2  jmx:
3    jar_path: /opt/opentelemetry-java-contrib-jmx-metrics.jar
4    endpoint: localhost:9000
5    target_system: Cassandra,jvm
6    collection_interval: 60s
7    properties:
8      # Attribute 'endpoint' will be used for generic_node's node_id field.
9      otel.resource.attributes: endpoint=localhost:9000

The next step is to configure the processors:

  • Use the resourcedetection processor to create an identifier value for each Cassandra instance from which the metrics are scraped.
  • Add the batch processor to bundle the metrics from multiple receivers. We highly recommend using this processor in the configuration, especially for the benefit of the collector's logging component. If you would like to learn more about this processor check the documentation.
yaml
1processors:
2  resourcedetection:
3    detectors: ["system"]
4    system:
5      hostname_sources: ["os"]
6
7  batch:

Finally, as shown below, we’ll set up a destination for exporting the metrics.


You can check the configuration for your preferred destination from OpenTelemetry’s documentation here.

yaml
1exporters:
2  otlp: 
3    endpoint: https://otlp.nr-data.net:443
4    headers:
5      api-key: 00000-00000-00000
6    tls:
7      insecure: false

Set up the pipeline.

yaml
1service:
2  pipelines:
3    metrics:
4      receivers:
5      - jmx
6      processors:
7      - resourcedetection
8      - resourceattributetransposer
9      - resource
10      - batch
11      exporters:
12      - otlp

Viewing the metrics collected

Based on the config detailed above, the JMX metrics gatherer scrapes the following metrics and exports them to the destination.

MetricDescription
cassandra.client.request.countThe total request count
cassandra.client.request.error.countThe total number of requests that have returned an error
cassandra.client.request.range_slice.latency.50pThe total number of requests that are range sliced at 50%
cassandra.client.request.range_slice.latency.99pThe total number of requests that are range sliced at 90%
cassandra.client.request.range_slice.latency.maxThe total number of request range sized at the maximum limit.
cassandra.client.request.read.latency.50pThe latency for read requests at 50%
assandra.client.request.read.latency.99pThe latency for read requests at 99%
cassandra.client.request.read.latency.maxThe latency for read requests at the maximum limit.
cassandra.client.request.write.latency.50pThe latency for write requests at 50%
cassandra.client.request.write.latency.99pThe latency for write requests at 99%
cassandra.client.request.write.latency.maxThe latency for write requests at the maximum limit.
cassandra.compaction.tasks.completedThe total number of compaction tasks completed.
cassandra.compaction.tasks.pendingThe number of compaction tasks pending.
cassandra.storage.load.countThe total storage load count.
cassandra.storage.total_hints.countThe total storage load hints count.
cassandra.storage.total_hints.in_progress.countThe total number of hints that are in progress.

observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.

Deepa Ramachandra
Deepa Ramachandra
Share:

Related posts

All posts

Get our latest content
in your inbox every week

By subscribing to our Newsletter, you agreed to our Privacy Notice

Community Engagement

Join the Community

Become a part of our thriving community, where you can connect with like-minded individuals, collaborate on projects, and grow together.

Ready to Get Started

Deploy in under 20 minutes with our one line installation script and start configuring your pipelines.

Try it now