How to Monitor SQL Server with OpenTelemetry
At observIQ, we've seen steady interest in observing the health of Windows systems and applications using OpenTelemetry. Requests on the SQL Server receiver continue to garner significant interest, so let's start there.
Below are steps to get up and running quickly with the contrib distribution of the OpenTelemetry collector. We'll be collecting and shipping SQL Server metrics to a popular backend, Google Cloud.
What is OpenTelemetry?
OpenTelemetry “OTel” is a robust and comprehensive telemetry framework designed to capture, process, and transmit telemetry data such as distributed traces, metrics, and logs from your systems to an observability or SIEM backend for analysis.
OpenTelemetry's Core Components
As a quick primer, the OTel collector has a few primary components that facilitate data collection, processing, and transmission of the above signals. Here’s a quick breakdown:
OpenTelemetry Collector: a lightweight data collector that can be deployed as an on-host agent or as a gateway for other collectors--shipping data to one or many configured destinations. The collector has a few primary components:
- Receivers: collect telemetry from a specific application or system (like SQL Server) or another OpenTelemetry collector via OTLP.
- Processors: transform the data by providing the levers to enrich, filter, mask (PII), and other data refinement techniques. Advanced users can utilize OTTL to do really interesting transformations.
- Exporters: transmit telemetry to another destination: another collector, to file, to an observability/SIEM backend
These components can be chained together as a logical pipeline in the collector’s configuration file, mirroring the end-to-end flow of a telemetry pipeline. Next, let’s jump into some of the key SQL Server signals.
OpenTelemetry Components to Monitor SQL Server
Here are a few of the key OpenTelemetry components you can use to monitor your instance:
- sqlserver receiver collects SQL Server database/instance metrics
- hostmetrics receiver collects operating system and specific process metrics
- windowseventlog captures, parses, and ships Windows Events in a standardized way
Which Signals Matter for Monitoring SQL Server?
Here’s a short list of signals to consider when implementing SQL Server monitoring in your environment:
- Cache Hit Ratio
- Monitors how quickly requests are being handled from memory. If the ratio is lower, SQL Server may need more memory allocated.
- Transaction Write Rate
- Monitors the rate of transactions in the database. This provides a valuable context for overall activity in the database, bottlenecks, and over-utilization.
- User Connections
- Monitors active user connections in the database.
- Page Split Rates
- Monitors the rate of page splits, which occur when there’s insufficient space in an index. Excess page splitting can cause excessive disk I/O and decrease performance over time, and it is incredibly impactful in clustered environments.
- Lock Wait Rates
- Monitors the rate lock waits, occurring when a transaction needs to access another transaction's resource. Monitoring lock waits can help identify blocking and deadlocking issues, which can severely impact transaction performance.
- Log File Size and Growth
- Monitoring the log file volume and growth can prevent space issues and provide more insight into transaction volume and indicators to increase transaction volume size.
- OS/Process Metrics
- Monitor SQL server process consumption on a Windows host.
- Monitor OS consumption metrics to understand the Windows host’s overall health.
- Windows Events
- Monitors application, system, and security events related to SQL Server. It'd help provide context to help with root cause analysis.
Conveniently, all these signals (and more) can be gathered with OpenTelemetry.
Related Content: How to Monitor MySQL with OpenTelemetry
Setting Up the OTel Collector to Monitor SQL Server
Prerequisites
- Access to Windows Host running SQL Server (2012 R2 or later)
- Download the most recent 'otelcol-contrib' tarball for Windows from the releases linked in the getting started docs. This package includes all the components we need to step through this example.
- Extract the tarball after you download it. I’d recommend downloading 7-zip, or you can use Windows's relatively new tar PowerShell command:
1tar -xvzf .\whatever.tar.gz
- Have a backend ready to ship and analyze your monitoring data. For this example, I’m using Google Cloud Operations, a destination I’ve frequently used. To send to Google Cloud Operations, you'll need a service account in your project with the following roles:
- Logs Writer
- Monitoring Admin
- Create a new json key for your service account and copy it into a file for the collector to access
- Set the full path to your JSON key file in the GOOGLE_APPLICATION_CREDENTIALS environment variable.
Depending on your setup, there are other options available too!
Configuring the SQL Server receiver
In a Windows environment, you must first manually create a configuration file in the collector’s directory. This file provides instructions to the collector, calling specific components you’ve identified in your pipeline.
First, let's start by adding a SQL Server receiver to your pipeline:
1receivers:
2 sqlserver:
3 collection_interval: 30s
Configuring the Host Metric receiver
Next, add the Host Metric Receiver to our configuration, which is configured to gather cpu and memory metrics:
1receivers:
2 sqlserver:
3 collection_interval: 30s
4 hostmetrics:
5 collection_interval: 30s
6 scrapers:
7 cpu:
8 memory:
Configuring the Windows Events receiver
After that, add the Windows Events Receiver to our configuration, configuring it to collect application, system, and security events.
1receivers:
2 sqlserver:
3 collection_interval: 30s
4 hostmetrics:
5 collection_interval: 30s
6 scrapers:
7 cpu:
8 memory:
9 windowseventlog:
10 channel: application,system,security
Configuring the Google Cloud exporter
Lastly, let's add the Google Cloud Exporter to our collector configuration. It will utilize the credentials/environment variable set in the prerequisite steps.
1receivers:
2 sqlserver:
3 collection_interval: 30s
4 hostmetrics:
5 collection_interval: 30s
6 scrapers:
7 cpu:
8 memory:
9 windowseventlog:
10 channel: application,system,security
11
12exporters:
13 googlecloud:
Configuring your Pipeline
Now, we can assemble our pipeline in the collector configuration, referencing the components we've added above. In this example, our pipeline will include metrics and logs as we're gathering both types of signals:
1receivers:
2 sqlserver:
3 collection_interval: 30s
4 hostmetrics:
5 collection_interval: 30s
6 scrapers:
7 cpu:
8 memory:
9 windowseventlog:
10 channel: application,system,security
11
12exporters:
13 googlecloud:
14service:
15 pipelines:
16 metrics:
17 receivers: [sqlserver,hostmetrics]
18 exporters: [googlecloud]
19 logs:
20 receivers: [windowseventlog]
21 exporters: [googlecloud]
Related Content: Turning Logs into Metrics with OpenTelemetry and BindPlane OP
Running the OTel collector
Run the collector binary by specifying the path to the configuration you just created, like below. Make sure to run as admin to collect all performance counters needed for metrics.
1.\otelcol-contrib.exe --config .\config.yaml
Viewing the metrics collected
If you followed the steps detailed above, the following SQL Server metrics will be available in your google cloud metrics explorer.
By selecting the optimal backend, designing intuitive dashboards, and configuring intelligent alerts, you're not just envisioning a more efficient SQL Server environment — you're making it happen. Need assistance, have requests, or suggestions? Feel free to contact our support team at support@observIQ.com.