Technical “How-To’s”

How to Monitor SQL Server with OpenTelemetry

Daniel Kuiper
Daniel Kuiper
Share:

At observIQ, we've seen growing interest in observing the health of Windows systems and applications using OpenTelemetry. Requests on the SQL Server receiver continue to garner the most interest, so let's start there.

Below are steps to get up and running quickly with the contrib distribution of the OpenTelemetry collector. We'll be collecting and shipping SQL Server metrics to a popular backend, Google Cloud.

What signals matter?

SQL Server is a Windows-based system designed to store and manage data. It’s a general-purpose system with a wide range of uses, so which signals are most important to you may vary. However, some common signals to look for are:

  • User Connections
    • Monitor the user connections, and prepare to increase load management during high volume
  • Page Split Rates
    • Excess page splitting can cause excessive disk I/O and decrease performance
  • Lock Wait Rates
    • Monitor if resource usage is blocked by high volume – lift locks to increase access, but slow down performance

All of the above categories can be gathered with the SQL Server receiver – so let’s get started.

Before you begin

If you don’t already have an OpenTelemetry collector built with the latest SQL Server receiver, you’ll need that first.

  • Download the most recent 'otelcol-contrib' tarball for windows from the releases linked in the getting started docs.
  • Extract the tarball after you download, I used 7-Zip.

Configuring the SQL Server receiver

You'll need to create a configuration file to run the collector with. Call it something like config.yaml and edit to include the SQL Server receiver as shown below:

yaml
1receivers:
2  sqlserver:
3    collection_interval: 30s

Configuring the Google Cloud exporter

To send to google cloud, you'll need a service account in your project with the following roles:

  • Logs Writer
  • Monitoring Admin

Create a new json key for your service account and copy it into a file for the collector to access.

Set the full path to your json key file in the GOOGLE_APPLICATION_CREDENTIALS environment variable.

There are also other means of authenticating available depending on your setup.

Using the collector

Add the exporter configuration and then include the receiver and exporter in your service pipeline so it looks like the following:

yaml
1receivers:
2  sqlserver:
3    collection_interval: 30s
4
5exporters:
6  googlecloud:
7
8service:
9  pipelines:
10    metrics:
11      receivers: [sqlserver]
12      exporters: [googlecloud]

Note that you might configure your receivers, exporters, and any processors differently depending on your environment and monitoring needs. For example, a named instance of SQL Server requires a few more configuration details. Check out the receiver here for examples.

You can always configure more exporters if you'd like to send telemetry to multiple destinations.

Run the collector binary by specifying the path to the configuration you just created, like below. Make sure to run as admin to collect all performance counters needed for metrics.

sh
1.\otelcol-contrib.exe --config .\config.yaml

Viewing the metrics collected

If you followed the steps detailed above, the following SQL Server metrics will be available in your google cloud metrics explorer.

MetricDescription
sqlserver.batch.request.rateNumber of batch requests received by SQL Server.
sqlserver.batch.sql_compilation.rateNumber of SQL compilations needed.
sqlserver.batch.sql_recompilation.rateNumber of SQL recompilations needed.
sqlserver.lock.wait.rateNumber of lock requests resulting in a wait.
sqlserver.lock.wait_time.avgAverage wait time for all lock requests that had to wait.
sqlserver.page.buffer_cache.hit_ratioPages found in the buffer pool without having to read from disk.
sqlserver.page.checkpoint.flush.rateNumber of pages flushed by operations requiring dirty pages to be flushed.
sqlserver.page.lazy_write.rateNumber of lazy writes moving dirty pages to disk.
sqlserver.page.life_expectancyTime a page will stay in the buffer pool.
sqlserver.page.operation.rateNumber of physical database page operations issued.
sqlserver.page.split.rateNumber of pages split as a result of overflowing index pages.
sqlserver.transaction.rateNumber of transactions started for the database (not including XTP-only transactions).
sqlserver.transaction.write.rateNumber of transactions that wrote to the database and committed.
sqlserver.transaction_log.flush.data.rateTotal number of log bytes flushed.
sqlserver.transaction_log.flush.rateNumber of log flushes.
sqlserver.transaction_log.flush.wait.rateNumber of commits waiting for a transaction log flush.
sqlserver.transaction_log.growth.countTotal number of transaction log expansions for a database.
sqlserver.transaction_log.shrink.countTotal number of transaction log shrinks for a database.
sqlserver.transaction_log.usagePercent of transaction log space used.
sqlserver.user.connection.countNumber of users connected to the SQL Server.

Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com.

Daniel Kuiper
Daniel Kuiper
Share:

Related posts

All posts

Get our latest content
in your inbox every week

By subscribing to our Newsletter, you agreed to our Privacy Notice

Community Engagement

Join the Community

Become a part of our thriving community, where you can connect with like-minded individuals, collaborate on projects, and grow together.

Ready to Get Started

Deploy in under 20 minutes with our one line installation script and start configuring your pipelines.

Try it now