With observIQ’s latest contributions to OpenTelemetry, you can now use free open source tools to easily monitor Aerospike. The easiest way to use the latest OpenTelemetry tools is with observIQ’s distribution of the OpenTelemetry collector. You can find it here.
In this blog, the Aerospike receiver is configured to monitor metrics locally with OTLP–you can use the Aerospike receiver to ship metrics to many popular analysis tools, including Google Cloud, New Relic, and more. For Google Cloud users, the Aerospike receiver is also available through the Google Ops Agent.
What signals matter?
Aerospike is a distributed, fast noSQL database technology. It uses flash storage for predictable performance, and is useful for its ability to add new nodes without downtime. Aerospike operates in-memory, so memory-related metrics are important to monitor.
- Aerospike.node.memory.free
- This metric monitors the percentage of memory that is free to the Aerospike node. If the value gets too low, that indicates that the server us reaching its memory limit. If nodes frequently use high amounts of memory, operations should consider adding new nodes or increasing memory allocation per node.
- Aerospike.namespace.memory.free
- This metric monitors the percentage of memory allocated to the specific namespace that is still available. If a namespace runs out of memory, or reaches its high watermark, writes to the namespace will fail.
- Aerospike.node.connection.count
- This metric indicates the number of connections opened and closed to the Aerospike node. Anomalous values could indicate client applications being unable to connect or peer nodes being unreachable or frequently crashing.
All of the above metrics and more are shipped when you install the Aerospike receiver.
Installing the Receiver
If you don’t already have an OpenTelemetry collector built with the latest Aerospike receiver installed, we suggest using the observIQ OpenTelemetry Collector distro that includes the aerospike receiver (and many others). Installation is simple with our one-line installer. Come back to this blog after running the install command on your source.
Configuring the Receiver
Navigate to your OpenTelemetry configuration file. The Aerospike receiver is Linux-only. If you’re using the observIQ Collector, you’ll find it in one of the following location:
- /opt/observiq-otel-collector/config.yaml (Linux)
Edit the configuration file to include the Aerospike receiver as shown below:
receivers:
aerospike:
endpoint: localhost:9000
collect_cluster_metrics: false
collection_interval: 30s
Add Aerospike into your Service pipeline so it looks similar to the following. Note that your processors and exporters may be different.
exporters:
otlp:
endpoint: 0.0.0.0:9124
service:
pipelines:
metrics:
receivers: [aerospike]
exporters: [otlp]
Below are a few editable fields you can add or adjust in the config file.
Field | Type | Description |
---|---|---|
endpoint | string | Aerospike endpoint to collect from |
collect_cluster_metrics | bool | If enabled, the receiver will discover peer nodes to the original Aerospike node. |
username | string | (Enterprise Edition) The username to authenticate with. |
password | string | (Enterprise Edition) The password to authenticate with. |
Viewing the metrics collected
If you followed the steps detailed above, the following Aerospike metrics will now be delivered to your OTel destination.
Name | Description |
---|---|
aerospike.namespace.disk.available | Minimum percentage of contiguous disk space free to the namespace across all devices |
aerospike.namespace.geojson.region_query_cells | Number of cell coverings for query region queried Number of cell coverings for query region queried. Aerospike metric geo_region_query_cells. |
aerospike.namespace.geojson.region_query_false_positive | Number of points outside the region. Total query result points is geo_region_query_points + geo_region_query_falsepos. Aerospike metric geo_regio_query_falspos. |
aerospike.namespace.geojson.region_query_points | Number of points within the region. Total query result points is geo_region_query_points + geo_region_query_falsepos. Aerospike metric geo_region_query_points. |
aerospike.namespace.geojson.region_query_requests | Number of geojson queries on the system since the uptime of the node. Number of geojson queries on the system since the uptime of the node. Aerospike metric geo_region_query_reqs. |
aerospike.namespace.memory.free | Percentage of the namespace's memory which is still free Aerospike metric memory_free_pct |
aerospike.namespace.memory.usage | Memory currently used by each component of the namespace Aggregate of Aerospike Metrics memory_used_data_bytes, memory_used_index_bytes, memory_used_set_index_bytes, memory_used_sindex_bytes |
aerospike.namespace.query.count | Number of query operations performed on the namespace Aggregate of Aerospike Metrics query_aggr_abort, query_aggr_complete, query_aggr_error, query_basic_abort, query_basic_complete, query_basic_error, query_ops_bg_abort, query_ops_bg_complete, query_ops_bg_error, query_udf_bg_abort, query_udf_bg_complete, query_udf_bg_error, pi_query_aggr_abort, pi_query_aggr_complete, pi_query_aggr_error, pi_query_long_basic_abort, pi_query_long_basic_complete, pi_query_long_basic_error, pi_query_ops_bg_abort, pi_query_ops_bg_basic_complete, pi_query_ops_bg_basic_error, pi_query_short_basic_timeout, pi_query_short_basic_complete, pi_query_short_basic_error, pi_query_udf_bg_abort, pi_query_udf_bg_complete, pi_query_udf_bg_error, si_query_aggr_abort, si_query_aggr_complete, si_query_aggr_error, si_query_long_basic_abort, si_query_long_basic_complete, si_query_long_basic_error, si_query_ops_bg_abort, si_query_ops_bg_basic_complete, si_query_ops_bg_basic_error, si_query_short_basic_timeout, si_query_short_basic_complete, si_query_short_basic_error, si_query_udf_bg_abort, si_query_udf_bg_complete, si_query_udf_bg_error |
aerospike.namespace.scan.count | Number of scan operations performed on the namespace Aggregate of Aerospike Metrics scan_aggr_abort, scan_aggr_complete, scan_aggr_error, scan_basic_abort, scan_basic_complete, scan_basic_error, scan_ops_bg_abort, scan_ops_bg_complete, scan_ops_bg_error, scan_udf_bg_abort, scan_udf_bg_complete, scan_udf_bg_error |
aerospike.namespace.transaction.count | Number of transactions performed on the namespace Aggregate of Aerospike Metrics client_delete_error, client_delete_filtered_out, client_delete_not_found, client_delete_success, client_delete_timeout, client_read_error, client_read_filtered_out, client_read_not_found, client_read_success, client_read_timeout, client_udf_error, client_udf_filtered_out, client_udf_not_found, client_udf_success, client_udf_timeout, client_write_error, client_write_filtered_out, client_write_not_found, client_write_success, client_write_timeout |
aerospike.node.connection.count | Number of connections opened and closed to the node Aggregate of Aerospike Metrics client_connections_closed, client_connections_opened, fabric_connections_closed, fabric_connections_opened, heartbeat_connections_closed, heartbeat_connections_opened |
aerospike.node.connection.open | Current number of open connections to the node Aggregate of Aerospike Metrics client_connections, fabric_connections, heartbeat_connections |
aerospike.node.memory.free | Percentage of the node's memory which is still free Aerospike Metric system_free_mem_pct |
aerospike.node.query.tracked | Number of queries tracked by the system. Number of queries which ran more than query untracked_time (default 1 sec), Aerospike metric query_tracked |
observIQ’s monitoring technology is a game changer for organizations that care about performance and efficiency. If you’re using Vault, our solutions can make a significant difference in your infrastructure monitoring. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com.