At observIQ, we've seen growing interest in observing the health of Windows systems and applications using OpenTelemetry. Requests on the SQL Server receiver continue to garner the most interest, so let's start there.
Below are steps to get up and running quickly with the contrib distribution of the OpenTelemetry collector. We'll be collecting and shipping SQL Server metrics to a popular backend, Google Cloud.
What signals matter?
SQL Server is a Windows-based system designed to store and manage data. It’s a general-purpose system with a wide range of uses, so which signals are most important to you may vary. However, some common signals to look for are:
- User Connections
- Monitor the user connections, and prepare to increase load management during high volume
- Page Split Rates
- Excess page splitting can cause excessive disk I/O and decrease performance
- Lock Wait Rates
- Monitor if resource usage is blocked by high volume – lift locks to increase access, but slow down performance
All of the above categories can be gathered with the SQL Server receiver – so let’s get started.
Before you begin
If you don’t already have an OpenTelemetry collector built with the latest SQL Server receiver, you’ll need that first.
- Download the most recent 'otelcol-contrib' tarball for windows from the releases linked in the getting started docs.
- Extract the tarball after you download, I used 7-Zip.
Configuring the SQL Server receiver
You'll need to create a configuration file to run the collector with. Call it something like config.yaml and edit to include the SQL Server receiver as shown below:
1receivers: 2 sqlserver: 3 collection_interval: 30s
Configuring the Google Cloud exporter
To send to google cloud, you'll need a service account in your project with the following roles:
- Logs Writer
- Monitoring Admin
Create a new json key for your service account and copy it into a file for the collector to access.
Set the full path to your json key file in the GOOGLE_APPLICATION_CREDENTIALS environment variable.
There are also other means of authenticating available depending on your setup.
Using the collector
Add the exporter configuration and then include the receiver and exporter in your service pipeline so it looks like the following:
1receivers: 2 sqlserver: 3 collection_interval: 30s 4 5exporters: 6 googlecloud: 7 8service: 9 pipelines: 10 metrics: 11 receivers: [sqlserver] 12 exporters: [googlecloud]
Note that you might configure your receivers, exporters, and any processors differently depending on your environment and monitoring needs. For example, a named instance of SQL Server requires a few more configuration details. Check out the receiver here for examples.
You can always configure more exporters if you'd like to send telemetry to multiple destinations.
Run the collector binary by specifying the path to the configuration you just created, like below. Make sure to run as admin to collect all performance counters needed for metrics.
1.\otelcol-contrib.exe --config .\config.yaml
Viewing the metrics collected
If you followed the steps detailed above, the following SQL Server metrics will be available in your google cloud metrics explorer.
|sqlserver.batch.request.rate||Number of batch requests received by SQL Server.|
|sqlserver.batch.sql_compilation.rate||Number of SQL compilations needed.|
|sqlserver.batch.sql_recompilation.rate||Number of SQL recompilations needed.|
|sqlserver.lock.wait.rate||Number of lock requests resulting in a wait.|
|sqlserver.lock.wait_time.avg||Average wait time for all lock requests that had to wait.|
|sqlserver.page.buffer_cache.hit_ratio||Pages found in the buffer pool without having to read from disk.|
|sqlserver.page.checkpoint.flush.rate||Number of pages flushed by operations requiring dirty pages to be flushed.|
|sqlserver.page.lazy_write.rate||Number of lazy writes moving dirty pages to disk.|
|sqlserver.page.life_expectancy||Time a page will stay in the buffer pool.|
|sqlserver.page.operation.rate||Number of physical database page operations issued.|
|sqlserver.page.split.rate||Number of pages split as a result of overflowing index pages.|
|sqlserver.transaction.rate||Number of transactions started for the database (not including XTP-only transactions).|
|sqlserver.transaction.write.rate||Number of transactions that wrote to the database and committed.|
|sqlserver.transaction_log.flush.data.rate||Total number of log bytes flushed.|
|sqlserver.transaction_log.flush.rate||Number of log flushes.|
|sqlserver.transaction_log.flush.wait.rate||Number of commits waiting for a transaction log flush.|
|sqlserver.transaction_log.growth.count||Total number of transaction log expansions for a database.|
|sqlserver.transaction_log.shrink.count||Total number of transaction log shrinks for a database.|
|sqlserver.transaction_log.usage||Percent of transaction log space used.|
|sqlserver.user.connection.count||Number of users connected to the SQL Server.|
Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com.