observIQ’s OpenTelemetry members contributed Riak metric monitoring support to OpenTelemetry! You can now monitor your Riak agent performance with OpenTelemetry, and deploy simply with the oIQ OpenTelemetry Collector.
You can add the Riak metric receiver to any OpenTelemetry collector. This post demonstrates a configuration for shipping metrics to Google Cloud Operations with OpenTelemetry components. This configuration and many other observIQ OpenTelemetry configurations are available in the oIQ Opentelemetry Collector.
Installation and configuration is simple, but you may want to refine your configuration to your needs once the metric receiver is up and running. The configuration is easily editable as a yaml file. You can find more documentation, example configurations for other recievers, and observability tools on GitHub and on our blog.
What Matters for Riak Metrics
Riak deployments can get complicated and tedious. Large environments stress throughput, and monitoring metrics offers insight for resource usage, stability, and overall health.
Step 1: Installing the Collector
The oIQ OpenTelemetry Collector can be installed on Windows, MacOS, and Linux using single line install commands that can be copied directly from GitHub. Be sure that you have administrator privileges on the device or VM that you are running the installation on.
msiexec /i "https://github.com/observIQ/observiq-otel-collector/releases/latest/download/observiq-otel-collector.msi" /quiet
sudo sh -c "$(curl -fsSlL https://github.com/observiq/observiq-otel-collector/releases/latest/download/install_unix.sh)" install_unix.sh
Step 2: Prerequisites and Authentication Credentials
In the following example, we are using Google Cloud Operations as the destination. However, OpenTelemtry offers exporters for many destinations. Check out the list of exporters here.
Setting up Google Cloud exporter prerequisites:
If running outside of Google Cloud (On prem, AWS, etc) or without the Cloud Monitoring scope, the Google Exporter requires a service account.
Create a service account with the following roles:
- Metrics: roles/monitoring.metricWriter
- Logs: roles/logging.logWriter
Create a service account JSON key and place it on the system that is running the collector.
In this example, the key is placed at /opt/observiq-otel-collector/sa.json and its permissions are restricted to the user running the collector process.
sudo cp sa.json /opt/observiq-otel-collector/sa.json sudo chown observiq-otel-collector: /opt/observiq-otel-collector/sa.json sudo chmod 0400 /opt/observiq-otel-collector/sa.json
Set the GOOGLE_APPLICATION_CREDENTIALS environment variable by creating a systemd override. A systemd override allows users to modify the systemd service configuration without modifying the service directly. This allows package upgrades to happen seamlessly. You can learn more about systemd units and overrides here.
Run the following command
sudo systemctl edit observiq-otel-collector
If this is the first time an override is being created, paste the following contents into the file:
If an override is already in place, simply insert the Environment parameter into the existing Service section.
Restart the collector
sudo systemctl restart observiq-otel-collector
In this example, the key is placed at C:/observiq/collector/sa.json.
Set the GOOGLE_APPLICATION_CREDENTIALS with the command prompt setx command.
Run the following command
setx GOOGLE_APPLICATION_CREDENTIALS "C:/observiq/collector/sa.json" /m
Restart the service using the services application.
Step 3: Configure the Riak Receiver
After installation, the config file for the collector can be found at:
Windows: C:\Program Files\observIQ OpenTelemetry Collector\config.yaml
Edit the config file with the following configuration:
receivers: riak: endpoint: "localhost:8098" username: username password: password collection_interval: 60s processors: # Resourcedetection is used to add a unique (host.name) # to the metric resource(s), allowing users to filter # between multiple agent systems. resourcedetection: detectors: ["system"] system: hostname_sources: ["os"] # Used for Google generic_node mapping. resource: attributes: - key: namespace value: "riak" action: upsert - key: location value: "global" action: upsert normalizesums: batch: exporters: googlecloud: retry_on_failure: enabled: false metric: prefix: workload.googleapis.com resource_mappings: - source_type: "" target_type: generic_node label_mappings: - source_key: host.name target_key: node_id - source_key: location target_key: location - source_key: namespace target_key: namespace service: pipelines: metrics: receivers: - riak processors: - resourcedetection - resource - normalizesums - batch exporters: - googlecloud
The configuration is set to receive metrics from the Riak system to Google Cloud. You can specify your own destination and insert any necessary credentials in the “receivers” section of the config file, near the top.
The following notes apply for Google Cloud:
- The interval for fetching metrics is 60 seconds by default.
- In the Google Cloud exporter, do the following mapping:
- Set the target type to a generic node, to simplify filtering metrics from the collector in cloud monitoring.
- Set node_id, location, and namespace for the metrics. Location and namespace are set from the resource processor.
- The project ID is not set in the configuration. Google automatically detects the project ID.
- Add the normalizesums processor to exclude the first metric that has a zero value when the configuration is done and the collector is restarted. To know more about this processor, check the OpenTelemetry documentation.
- Add the batch processor to bundle the metrics from multiple receivers. We highly recommend using this processor in the configuration, especially for the benefit of the logging component of the collector.
Step 4: View Your Metrics
Below is a list of metrics that are collected by the OpenTelemetry Riak receiver. The metrics are sent to Google Cloud, or the destination designated during setup, for analysis.
|riak.node.operation.count||The number of operations performed by the node|
|riak.node.operation.time.mean||The mean time between request and response for operations performed by the node over the last minute|
|riak.node.read_repair.count||The number of read repairs performed by the node|
|riak.memory.limit||The amount of memory allocated to the node|
|riak.vnode.operation.count||The number of operations performed by vnodes on the node|
|riak.vnode.index.operation.count||The number of index operations performed by vnodes on the node|
To view the metrics, go to the Google Cloud Console. Head to metrics explorer. Select the resource as a generic node. You can filter by namespace to view specific metrics.
observIQ’s OpenTelemetry distribution is an easy way for anyone looking to implement OpenTelemetry observability standards in their IT environments. The single line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com.