The Observability Blog

Categories:
  • Google Cloud
  • Log Management
  • Metrics
  • OpenTelemetry

How to Monitor SQL Server with OpenTelemetry

by Keith Schmitt on
July 20, 2022

We’re excited to announce the latest OpenTelemetry receiver contribution: SQL Server monitoring. All of our logging and monitoring contributions are available through the open source OpenTelemetry collector. You can check it out here

You can utilize this receiver in conjunction with any OTel collector: including the OpenTelemetry Collector and observIQ’s distribution of the collector.

Below are steps to get up and running quickly with observIQ’s distribution, and shipping SQL Server metrics to a popular backend: Google Cloud Ops. You can find out more on observIQ’s GitHub page: https://github.com/observIQ/observiq-otel-collector

What signals matter?

SQL Server is a Windows-based system designed to store and manage data. It’s a general-purpose system with a wide range of uses, so which signals are most important to you may vary. However, some common signals to look for are:

  • User Connections
    • Monitor the user connections, and prepare to increase load management during high volume.
  • Page Split Rates
    • Excess page splitting can cause excessive disk I/O and decrease performance
  • Lock Wait Rates
    • Monitor if resource usage is blocked by high volume – lift locks to increase access, but slow down performance

All of the above categories can be gathered with the SQL Server receiver – so let’s get started.

Before you begin

If you don’t already have an OpenTelemetry collector built with the latest SQL Server receiver installed, you’ll need to do that first. We suggest using observIQ’s distribution of the OpenTelemetry Collector that includes the SQL Server receiver (and many others) and is simple to install with our one-line installer.

Configuring the SQL Server receiver

Navigate to your OpenTelemetry configuration file. If you’re using the observIQ Distribution, you’ll find it in the following location: 

  • C:\Program Files\observIQ OpenTelemetry Collector\config.yaml (Windows)

Edit the configuration file to include the SQL Server receiver as shown below:

receivers:
  sqlserver:
    collection_interval: 10s

Don’t forget to also add SQL Server into your Service pipeline so it looks similar to the following. Note that your processors and exporters will likely be different.

receivers:
  sqlserver:
    collection_interval: 10s

exporters:
  googlecloud:

service:
  pipelines:
    metrics:
      receivers: [sqlserver]
      exporters: [googlecloud]

You can further edit the config file to include specific labels under the “processors” field, and export metrics to specific or multiple destinations under the “exporters” field.

Viewing the metrics collected

If you followed the steps detailed above, the following SQL Server metrics will now be delivered to your preferred destination.

MetricDescription
sqlserver.batch.request.rate Number of batch requests received by SQL Server.
sqlserver.batch.request.rate Number of SQL compilations needed.
sqlserver.batch.sql_recompilation.rate Number of SQL recompilations needed.
sqlserver.lock.wait.rate Number of lock requests resulting in a wait.
sqlserver.lock.wait_time.avg Average wait time for all lock requests that had to wait.
sqlserver.page.buffer_cache.hit_ratio Pages found in the buffer pool without having to read from disk.
sqlserver.page.checkpoint.flush.rate Number of pages flushed by operations requiring dirty pages to be flushed.
sqlserver.page.lazy_write.rate Number of lazy writes moving dirty pages to disk.
sqlserver.page.life_expectancy Time a page will stay in the buffer pool.
sqlserver.page.operation.rate Number of physical database page operations issued.
sqlserver.page.split.rate Number of pages split as a result of overflowing index pages.
sqlserver.transaction.rate Number of transactions started for the database (not including XTP-only transactions).
sqlserver.transaction.write.rate Number of transactions that wrote to the database and committed.
sqlserver.transaction_log.flush.data.rate Total number of log bytes flushed.
sqlserver.transaction_log.flush.rate Number of log flushes.
sqlserver.transaction_log.flush.wait.rate Number of commits waiting for a transaction log flush.
sqlserver.transaction_log.growth.count Total number of transaction log expansions for a database.
sqlserver.transaction_log.shrink.count Total number of transaction log shrinks for a database.
sqlserver.transaction_log.usage Percent of transaction log space used.
sqlserver.user.connection.count Number of users connected to the SQL Server.

observIQ’s distribution of the OpenTelemetry collector is a game-changer for companies looking to implement OpenTelemetry standards. The single line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com.