Technical “How-To’s”

How to Monitor SQL Server with OpenTelemetry

Daniel Kuiper
Daniel Kuiper
Share:

At observIQ, we've seen steady interest in observing the health of Windows systems and applications using OpenTelemetry. Requests on the SQL Server receiver continue to garner significant interest, so let's start there.

Below are steps to get up and running quickly with the contrib distribution of the OpenTelemetry collector. We'll be collecting and shipping SQL Server metrics to a popular backend, Google Cloud.

What is OpenTelemetry?

OpenTelemetry “OTel” is a robust and comprehensive telemetry framework designed to capture, process, and transmit telemetry data such as distributed traces, metrics, and logs from your systems to an observability or SIEM backend for analysis.

OpenTelemetry's Core Components

As a quick primer, the OTel collector has a few primary components that facilitate data collection, processing, and transmission of the above signals. Here’s a quick breakdown:

OpenTelemetry Collector: a lightweight data collector that can be deployed as an on-host agent or as a gateway for other collectors--shipping data to one or many configured destinations. The collector has a few primary components:

  • Receivers: collect telemetry from a specific application or system (like SQL Server) or another OpenTelemetry collector via OTLP.
  • Processors: transform the data by providing the levers to enrich, filter, mask (PII), and other data refinement techniques. Advanced users can utilize OTTL to do really interesting transformations.
  • Exporters: transmit telemetry to another destination: another collector, to file, to an observability/SIEM backend

These components can be chained together as a logical pipeline in the collector’s configuration file, mirroring the end-to-end flow of a telemetry pipeline. Next, let’s jump into some of the key SQL Server signals.

OpenTelemetry Components to Monitor SQL Server

Here are a few of the key OpenTelemetry components you can use to monitor your instance:

  • sqlserver receiver collects SQL Server database/instance metrics
  • hostmetrics receiver collects operating system and specific process metrics
  • windowseventlog captures, parses, and ships Windows Events in a standardized way

Which Signals Matter for Monitoring SQL Server?

Here’s a short list of signals to consider when implementing SQL Server monitoring in your environment:

  • Cache Hit Ratio
    • Monitors how quickly requests are being handled from memory. If the ratio is lower, SQL Server may need more memory allocated.
  • Transaction Write Rate
    • Monitors the rate of transactions in the database. This provides a valuable context for overall activity in the database, bottlenecks, and over-utilization.
  • User Connections
    • Monitors active user connections in the database.
  • Page Split Rates
    • Monitors the rate of page splits, which occur when there’s insufficient space in an index. Excess page splitting can cause excessive disk I/O and decrease performance over time, and it is incredibly impactful in clustered environments.
  • Lock Wait Rates
    • Monitors the rate lock waits, occurring when a transaction needs to access another transaction's resource. Monitoring lock waits can help identify blocking and deadlocking issues, which can severely impact transaction performance.
  • Log File Size and Growth
    • Monitoring the log file volume and growth can prevent space issues and provide more insight into transaction volume and indicators to increase transaction volume size.
  • OS/Process Metrics
    • Monitor SQL server process consumption on a Windows host.
    • Monitor OS consumption metrics to understand the Windows host’s overall health.
  • Windows Events
    • Monitors application, system, and security events related to SQL Server. It'd help provide context to help with root cause analysis.

Conveniently, all these signals (and more) can be gathered with OpenTelemetry.


Related Content: How to Monitor MySQL with OpenTelemetry

Setting Up the OTel Collector to Monitor SQL Server

Prerequisites

  • Access to Windows Host running SQL Server (2012 R2 or later)
  • Download the most recent 'otelcol-contrib' tarball for Windows from the releases linked in the getting started docs. This package includes all the components we need to step through this example.
  • Extract the tarball after you download it. I’d recommend downloading 7-zip, or you can use Windows's relatively new tar PowerShell command:
bash
1tar -xvzf .\whatever.tar.gz
  • Have a backend ready to ship and analyze your monitoring data. For this example, I’m using Google Cloud Operations, a destination I’ve frequently used. To send to Google Cloud Operations, you'll need a service account in your project with the following roles:
    • Logs Writer
    • Monitoring Admin
    • Create a new json key for your service account and copy it into a file for the collector to access
    • Set the full path to your JSON key file in the GOOGLE_APPLICATION_CREDENTIALS environment variable.

Depending on your setup, there are other options available too!

Configuring the SQL Server receiver

In a Windows environment, you must first manually create a configuration file in the collector’s directory. This file provides instructions to the collector, calling specific components you’ve identified in your pipeline.

First, let's start by adding a SQL Server receiver to your pipeline:

yaml
1receivers:
2  sqlserver:
3    collection_interval: 30s

Configuring the Host Metric receiver

Next, add the Host Metric Receiver to our configuration, which is configured to gather cpu and memory metrics:

yaml
1receivers:
2  sqlserver:
3    collection_interval: 30s
4  hostmetrics:
5    collection_interval: 30s
6    scrapers:
7      cpu:
8      memory:

Configuring the Windows Events receiver

After that, add the Windows Events Receiver to our configuration, configuring it to collect application, system, and security events.

yaml
1receivers:
2  sqlserver:
3    collection_interval: 30s
4  hostmetrics:
5    collection_interval: 30s
6    scrapers:
7      cpu:
8      memory:
9  windowseventlog:
10        channel: application,system,security

Configuring the Google Cloud exporter

Lastly, let's add the Google Cloud Exporter to our collector configuration. It will utilize the credentials/environment variable set in the prerequisite steps.

yaml
1receivers:
2  sqlserver:
3    collection_interval: 30s
4  hostmetrics:
5    collection_interval: 30s
6    scrapers:
7      cpu:
8      memory:
9  windowseventlog:
10        channel: application,system,security
11
12exporters:
13  googlecloud:

Configuring your Pipeline

Now, we can assemble our pipeline in the collector configuration, referencing the components we've added above. In this example, our pipeline will include metrics and logs as we're gathering both types of signals:

yaml
1receivers:
2  sqlserver:
3    collection_interval: 30s
4  hostmetrics:
5    collection_interval: 30s
6    scrapers:
7      cpu:
8      memory:
9  windowseventlog:
10        channel: application,system,security
11
12exporters:
13  googlecloud:
14service:
15  pipelines:
16    metrics:
17      receivers: [sqlserver,hostmetrics]
18      exporters: [googlecloud]
19    logs:
20      receivers: [windowseventlog]
21      exporters: [googlecloud]

Note that you might configure your receivers and exporters differently depending on your environment and monitoring needs. For example, a named instance of SQL Server requires a few more configuration details. Check out the receiver here for examples.

You can always configure more exporters if you'd like to send telemetry to multiple destinations.

Related Content: Turning Logs into Metrics with OpenTelemetry and BindPlane OP

Running the OTel collector

Run the collector binary by specifying the path to the configuration you just created, like below. Make sure to run as admin to collect all performance counters needed for metrics.

sh
1.\otelcol-contrib.exe --config .\config.yaml

Viewing the metrics collected

If you followed the steps detailed above, the following SQL Server metrics will be available in your google cloud metrics explorer.

MetricDescription
sqlserver.batch.request.rateNumber of batch requests received by SQL Server.
sqlserver.batch.sql_compilation.rateNumber of SQL compilations needed.
sqlserver.batch.sql_recompilation.rateNumber of SQL recompilations needed.
sqlserver.lock.wait.rateNumber of lock requests resulting in a wait.
sqlserver.lock.wait_time.avgAverage wait time for all lock requests that had to wait.
sqlserver.page.buffer_cache.hit_ratioPages found in the buffer pool without having to read from disk.
sqlserver.page.checkpoint.flush.rateNumber of pages flushed by operations requiring dirty pages to be flushed.
sqlserver.page.lazy_write.rateNumber of lazy writes moving dirty pages to disk.
sqlserver.page.life_expectancyTime a page will stay in the buffer pool.
sqlserver.page.operation.rateNumber of physical database page operations issued.
sqlserver.page.split.rateNumber of pages split as a result of overflowing index pages.
sqlserver.transaction.rateNumber of transactions started for the database (not including XTP-only transactions).
sqlserver.transaction.write.rateNumber of transactions that wrote to the database and committed.
sqlserver.transaction_log.flush.data.rateTotal number of log bytes flushed.
sqlserver.transaction_log.flush.rateNumber of log flushes.
sqlserver.transaction_log.flush.wait.rateNumber of commits waiting for a transaction log flush.
sqlserver.transaction_log.growth.countTotal number of transaction log expansions for a database.
sqlserver.transaction_log.shrink.countTotal number of transaction log shrinks for a database.
sqlserver.transaction_log.usagePercent of transaction log space used.
sqlserver.user.connection.countNumber of users connected to the SQL Server.

By selecting the optimal backend, designing intuitive dashboards, and configuring intelligent alerts, you're not just envisioning a more efficient SQL Server environment — you're making it happen. Need assistance, have requests, or suggestions? Feel free to contact our support team at support@observIQ.com.

Daniel Kuiper
Daniel Kuiper
Share:

Related posts

All posts

Get our latest content
in your inbox every week

By subscribing to our Newsletter, you agreed to our Privacy Notice

Community Engagement

Join the Community

Become a part of our thriving community, where you can connect with like-minded individuals, collaborate on projects, and grow together.

Ready to Get Started

Deploy in under 20 minutes with our one line installation script and start configuring your pipelines.

Try it now