How to monitor MongoDB with OpenTelemetry
MongoDB is a document-oriented and cross-platform database that maintains its documents in the binary-encoded JSON format. Mongo’s replication capabilities and horizontal capability using database sharding make MongoDB highly available. An effective monitoring solution can make it easier for you to identify issues with MongoDB, such as resource availability, execution slowdowns, and scalability.
observIQ recently built and contributed a MongoDB metric receiver to the OpenTelemetry contrib repo. You can check it out here!
You can utilize this receiver in conjunction with any OTel Collector, including the OpenTelemetry Collector and observIQ’s distribution of the collector.
Below are steps to get up and running quickly with observIQ’s distribution, shipping MongoDB metrics to any popular backend. You can find out more about it on observIQ’s GitHub page.
You can find OTel config examples for MongoDB and other applications shipping to Google Cloud here.
Let’s get started!
What signals matter?
The most critical MongoDB-related metrics to monitor are:
- The status of processes and memory utilization: Monitoring MongoDB’s server processes helps identify slowness in its activity or health. Unresponsive processes during command execution are an example of a scenario that needs further analysis. The mongodb.collection.count metric helps determine the stability, restart numbers, and backup performance related to the collections in that MongoDB instance. The mongodb.data.size gives the value of the storage space consumed by the data in your current MongoDB instance.
Broken image
- Operations and connections metrics: When there are performance issues in the application, it is necessary to rule out if the problem stems from the database layer. In this case, monitoring the connections and operations patterns becomes very critical. Metrics such as mongodb.cache.operations and mongodb.connection.count gives insights into the connections’ operation and count. By monitoring the operations, you can draw a pattern and set thresholds and alerts for those thresholds.
Broken image
- Query Optimization: For a query, the MongoDB query optimizer chooses and caches the most efficient query plan given the available indexes. The most efficient query plan is evaluated based on the number of “work units” ( works ) performed by the query execution plan when the query planner evaluates candidate plans. For instance, metrics such as mongodb.global_lock.time show the trends in lock time for query optimization.
Broken image
Before creating your configuration, you should have observIQ’s distribution of the OpenTelemetry Collector installed. For installation instructions and the collector's latest version, check our GitHub repo.
Configuring the mongoDB receiver
After the installation, the config file for the collector can be found at:
- C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows)
- /opt/observiq-otel-collector/config.yaml(Linux)
Let’s begin with the configuration for the receiver.
- Here, we set up the host as the endpoint, essentially the IP address and port of the Mongo system.
- For all configurations using the Google Cloud Operations as an endpoint, the collection interval is set to 60s, which is the requirement.
- Disable TLS. This is done to remove any restriction from TLS to transmit the metrics data to the third party, in this case, Google Cloud Operations.
1receivers:
2 mongodb:
3 hosts:
4 - endpoint: 127.0.0.1:27017
5 collection_interval: 60s
6 # disable TLS
7 tls:
8 insecure: true
Next up, the processors:
Please note that these processors are optional. You may choose to use any of the available processors documented here.
- The resourcedetection processor will create a unique identifier for each MongoDB instance monitored using this configuration.
- Use the Normalize Sums Processor to average the initial metrics received for better visualization.
- Use the batch processor to collate the metrics from multiple receivers and send them to the exporter destination. We recommend using this processor with all receiver configurations when applicable.
1processors:
2 # Resourcedetection is used to add a unique (host.name)
3 # to the metric resource(s), allowing users to filter
4 # between multiple agent systems.
5 resourcedetection:
6 detectors: ["system"]
7 system:
8 hostname_sources: ["os"]
9
10 resourceattributetransposer:
11 operations:
12 - from: host.name
13 to: agent
14
15 normalizesums:
16
17 batch:
In this example, we are showing you a sample config for exporting metrics to Google Cloud. However, you may choose to export the metrics to any of the available destinations documented here. The configuration below exports the metrics to Google Cloud.
1exporters:
2 googlecloud:
3 retry_on_failure:
4 enabled: false
5 metric:
6 prefix: workload.googleapis.com
Finally, set up the pipeline.
1service:
2 pipelines:
3 metrics:
4 receivers:
5 - mongodb
6 processors:
7 - resourcedetection
8 - resourceattributetransposer
9 - normalizesums
10 - batch
11 exporters:
12 - googlecloud
Viewing the metrics collected
The following metrics are fetched using the configuration above:
To view the metrics, follow the steps outlined below:
- In the Google Cloud Console, head to Metrics Explorer.
- Select the resource as a generic node.
- Follow the namespace equivalent in the table above and filter the metric to view the chart.
Broken image
Broken image
observIQ’s distribution is a game-changer for companies looking to implement the OpenTelemetry standards. The single-line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, contact our support team at support@observIQ.com.