The Observability Blog

Categories:
  • Uncategorized

How to monitor Vault with Google Cloud Platform

by Nico Stewart on
September 2, 2022

Monitor Vault in Google Cloud Platform with the Google Ops Agent. The Ops Agent is available on GitHub, and makes it easy to collect and ship telemetry from dozens of sources directly to your Google Cloud Platform. You can check it out here!

Below are steps to get up and running quickly with observIQ’s Google Cloud Platform integrations, and monitor metrics and logs from Vault in your Google Cloud Platform. You can check out Google’s documentation for using the Ops Agent for Vault here: https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/install-index

What signals matter?

Vault is a secrets store that can be distributed across multiple instances with a high level of encryption to securely handle data. Our integration collects metrics around the operations executed against the store as well as metrics related to token interactions. There are also audit logs related to the operation executed.

  • vault.memory.usage 
    • This metric depicts the Vault RAM usage. Lower memory usage usually correlates to higher performance. If memory usage gets too high, interruptions, crashes, and data loss are possible.
  • Vault.token.lease.count
    • This metric is used to verify that leases are properly distributed and there are not more leases attempting access to the vault than expected.
  • Operation counts
    • Vault.storage.operation.get.count
    • Vault.storage.operation.list.count
    • Vault.storage.operation.put.count
    • Vault.storage.operation.delete.count
    • Operation counts are monitored to ensure that operations are completed correctly and that there aren’t any unexpected operations being performed.

All of the above categories can be gathered with the Vault receiver – so let’s get started.

Before you begin

If you don’t already have an Ops Agent with the latest Vault receiver installed, you’ll need to do that first. Check out the Google Cloud Platform Ops Agent documentation for installation methods, including the one-line installer.

Configuring the Vault receiver for Metrics and Logs

Navigate to your Ops Agent configuration file. You’ll find it in the following location: 

  •  /etc/google-cloud-ops-agent/config.yaml (Linux)

Edit the configuration file for Vault metrics as shown below:

metrics:
  receivers:
    vault:
      type: vault
      token: <VAULT_TOKEN>
      endpoint: 127.0.0.1:8200
  service:
    pipelines:
      vault:
        receivers:
          - vault

For Logging, add the following in the same yaml config file:

logging:
  receivers:
    vault_audit:
      type: vault_audit
      include_paths: [/var/log/vault_audit.log] 
  service:
    pipelines:
      vault:
        receivers:
          - vault_audit

Restart the Ops Agent with the following command:

sudo service google-cloud-ops-agent restart
sleep 30

You can edit the config file for more precise control over your agent behavior, but it is not necessary. Here is a list of the most relevant editable fields that you can edit to adjust your agent:

Metrics:

FieldRequired or Optional

Default

Description

`type`

required

Must be `vault`.

`endpoint`

optional

`localhost:8200`

hostname:port of vault instance to be monitored.

metrics_pathoptional

`/v1/sys/metrics`

the path for metrics collection.

`token`

optional

Token used for authentication.

`scheme`

optional

`http`

The scheme to use for the request.

`collection_interval`

optional

A [time.Duration](https://pkg.go.dev/time#ParseDuration) value, such as `30s` or `5m`.

`insecure`

optional

TRUE

Signals whether to use a secure TLS connection or not. If insecure is true TLS will not be enabled.

`insecure_skip_verify`

optional

FALSEWhether to skip verifying the certificate or not. A false value of insecure_skip_verify will not be used if insecure is true as the connection will not use TLS at all.

`cert_file`

optional

Path to the TLS cert to use for mTLS required connections.

`key_file`

optional

Path to the TLS key to use for mTLS required connections.

`ca_file`

optional

Path to the CA cert. As a client this verifies the server certificate. If empty, use system root CA.

Logs:

FieldDefaultDescription
`type`

required

Must be `vault_audit`.

include_pathsrequired

The log files to read.

`exclude_paths`

`[]`

Log files to exclude (if `include_paths` contains a glob or directory).

Viewing the metrics collected

If you followed the steps detailed above, the following Vault metrics will now be delivered to your preferred destination.

List of metrics collected:
Prefix: workload

Name

TypeUnit

Attributes

Description

vault.core.request.count

gauge

{requests}

The number of requests handled by the Vault core.

vault.core.leader.duration

gauge

msThe average amount of time a core was the leader in high availability mode.

vault.token.lease.count

gauge

{tokens}

The number of tokens that are leased for eventual expiration.

vault.token.count

cumulative

{tokens}

namespace, cluster

The number of tokens created.

vault.token.revoke.time

gauge

ms

The average time taken to revoke a token.

vault.token.renew.time

gauge

ms

The average time taken to renew a token.

vault.audit.request.failed

gauge

{requests}

The number of audit log requests that have failed.

vault.audit.response.failed

gauge

{responses}

The number of audit log responses that have failed.

vault.memory.usage

gauge

bytes

The amount of memory used by Vault.

vault.storage.operation.put.time

cumulative

ms

storage

The duration of put operations executed against the storage backend.

vault.storage.operation.delete.time

cumulative

ms

storage

The duration of delete operations executed against the storage backend.

vault.storage.operation.list.time

cumulative

ms

storage

The duration of list operations executed against the storage backend.

vault.storage.operation.get.time

cumulative

ms

storage

The duration of get operations executed against the storage backend.

vault.storage.operation.put.count

cumulative

operationsstorage

The count of put operations executed against the storage backend.
vault.storage.operation.delete.count

cumulative

operationsstorage

The count of delete operations executed against the storage backend.

vault.storage.operation.list.count

cumulative

operationsstorage

The count of list operations executed against the storage backend.

vault.storage.operation.get.count

cumulative

operationsstorage

The count of get operations executed against the storage backend.

observIQ’s monitoring technology is a game changer for organizations that care about performance and efficiency. If you’re using Vault, our solutions can make a significant difference in your infrastructure monitoring. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com.