The Observability Blog

Categories:
  • Uncategorized

How to Monitor Aerospike with OpenTelemetry

by Nico Stewart on
September 6, 2022

With observIQ’s latest contributions to OpenTelemetry, you can now use free open source tools to easily monitor Aerospike. The easiest way to use the latest OpenTelemetry tools is with observIQ’s distribution of the OpenTelemetry collector. You can find it here

In this blog, the Aerospike receiver is configured to monitor metrics locally with OTLP–you can use the Aerospike receiver to ship metrics to many popular analysis tools, including Google Cloud, New Relic, and more. For Google Cloud users, the Aerospike receiver is also available through the Google Ops Agent

What signals matter?

Aerospike is a distributed, fast noSQL database technology. It uses flash storage for predictable performance, and is useful for its ability to add new nodes without downtime. Aerospike operates in-memory, so memory-related metrics are important to monitor.

  • Aerospike.node.memory.free
    • This metric monitors the percentage of memory that is free to the Aerospike node. If the value gets too low, that indicates that the server us reaching its memory limit. If nodes frequently use high amounts of memory, operations should consider adding new nodes or increasing memory allocation per node.
  • Aerospike.namespace.memory.free
    • This metric monitors the percentage of memory allocated to the specific namespace that is still available. If a namespace runs out of memory, or reaches its high watermark, writes to the namespace will fail.
  • Aerospike.node.connection.count
    • This metric indicates the number of connections opened and closed to the Aerospike node. Anomalous values could indicate client applications being unable to connect or peer nodes being unreachable or frequently crashing.

All of the above metrics and more are shipped when you install the Aerospike receiver.

Installing the Receiver

If you don’t already have an OpenTelemetry collector built with the latest Aerospike receiver installed, we suggest using the observIQ OpenTelemetry Collector distro that includes the aerospike receiver (and many others). Installation is simple with our one-line installer. Come back to this blog after running the install command on your source.

Configuring the Receiver

Navigate to your OpenTelemetry configuration file. The Aerospike receiver is Linux-only. If you’re using the observIQ Collector, you’ll find it in one of the following location: 

  • /opt/observiq-otel-collector/config.yaml (Linux)

Edit the configuration file to include the Aerospike receiver as shown below:

receivers:
    aerospike:
        endpoint: localhost:9000
        collect_cluster_metrics: false
        collection_interval: 30s

Add Aerospike into your Service pipeline so it looks similar to the following. Note that your processors and exporters may be different.

exporters:
    otlp:
        endpoint: 0.0.0.0:9124
service:
    pipelines:
        metrics:
             receivers: [aerospike]
 	 exporters: [otlp]

Below are a few editable fields you can add or adjust in the config file.

FieldTypeDescription
endpointstringAerospike endpoint to collect from

collect_cluster_metrics

bool

If enabled, the receiver will discover peer nodes to the original Aerospike node.

username


string

(Enterprise Edition) The username to authenticate with.

password



string

(Enterprise Edition) The password to authenticate with.

Viewing the metrics collected

If you followed the steps detailed above, the following Aerospike metrics will now be delivered to your OTel destination.

NameDescription
aerospike.namespace.disk.available Minimum percentage of contiguous disk space free to the namespace across all devices
aerospike.namespace.geojson.region_query_cells Number of cell coverings for query region queried Number of cell coverings for query region queried. Aerospike metric geo_region_query_cells.
aerospike.namespace.geojson.region_query_false_positive Number of points outside the region. Total query result points is geo_region_query_points + geo_region_query_falsepos. Aerospike metric geo_regio_query_falspos.
aerospike.namespace.geojson.region_query_points Number of points within the region. Total query result points is geo_region_query_points + geo_region_query_falsepos. Aerospike metric geo_region_query_points.
aerospike.namespace.geojson.region_query_requests Number of geojson queries on the system since the uptime of the node. Number of geojson queries on the system since the uptime of the node. Aerospike metric geo_region_query_reqs.
aerospike.namespace.memory.free Percentage of the namespace's memory which is still free Aerospike metric memory_free_pct
aerospike.namespace.memory.usage Memory currently used by each component of the namespace Aggregate of Aerospike Metrics memory_used_data_bytes, memory_used_index_bytes, memory_used_set_index_bytes, memory_used_sindex_bytes
aerospike.namespace.query.count Number of query operations performed on the namespace Aggregate of Aerospike Metrics query_aggr_abort, query_aggr_complete, query_aggr_error, query_basic_abort, query_basic_complete, query_basic_error, query_ops_bg_abort, query_ops_bg_complete, query_ops_bg_error, query_udf_bg_abort, query_udf_bg_complete, query_udf_bg_error, pi_query_aggr_abort, pi_query_aggr_complete, pi_query_aggr_error, pi_query_long_basic_abort, pi_query_long_basic_complete, pi_query_long_basic_error, pi_query_ops_bg_abort, pi_query_ops_bg_basic_complete, pi_query_ops_bg_basic_error, pi_query_short_basic_timeout, pi_query_short_basic_complete, pi_query_short_basic_error, pi_query_udf_bg_abort, pi_query_udf_bg_complete, pi_query_udf_bg_error, si_query_aggr_abort, si_query_aggr_complete, si_query_aggr_error, si_query_long_basic_abort, si_query_long_basic_complete, si_query_long_basic_error, si_query_ops_bg_abort, si_query_ops_bg_basic_complete, si_query_ops_bg_basic_error, si_query_short_basic_timeout, si_query_short_basic_complete, si_query_short_basic_error, si_query_udf_bg_abort, si_query_udf_bg_complete, si_query_udf_bg_error
aerospike.namespace.scan.count Number of scan operations performed on the namespace Aggregate of Aerospike Metrics scan_aggr_abort, scan_aggr_complete, scan_aggr_error, scan_basic_abort, scan_basic_complete, scan_basic_error, scan_ops_bg_abort, scan_ops_bg_complete, scan_ops_bg_error, scan_udf_bg_abort, scan_udf_bg_complete, scan_udf_bg_error
aerospike.namespace.transaction.count Number of transactions performed on the namespace Aggregate of Aerospike Metrics client_delete_error, client_delete_filtered_out, client_delete_not_found, client_delete_success, client_delete_timeout, client_read_error, client_read_filtered_out, client_read_not_found, client_read_success, client_read_timeout, client_udf_error, client_udf_filtered_out, client_udf_not_found, client_udf_success, client_udf_timeout, client_write_error, client_write_filtered_out, client_write_not_found, client_write_success, client_write_timeout
aerospike.node.connection.count Number of connections opened and closed to the node Aggregate of Aerospike Metrics client_connections_closed, client_connections_opened, fabric_connections_closed, fabric_connections_opened, heartbeat_connections_closed, heartbeat_connections_opened
aerospike.node.connection.open Current number of open connections to the node Aggregate of Aerospike Metrics client_connections, fabric_connections, heartbeat_connections
aerospike.node.memory.free Percentage of the node's memory which is still free Aerospike Metric system_free_mem_pct
aerospike.node.query.tracked Number of queries tracked by the system. Number of queries which ran more than query untracked_time (default 1 sec), Aerospike metric query_tracked

observIQ’s monitoring technology is a game changer for organizations that care about performance and efficiency. If you’re using Vault, our solutions can make a significant difference in your infrastructure monitoring. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com.