Monitor Vault in Google Cloud Platform with the Google Ops Agent. The Ops Agent is available on GitHub, and makes it easy to collect and ship telemetry from dozens of sources directly to your Google Cloud Platform. You can check it out here!
Below are steps to get up and running quickly with observIQ’s Google Cloud Platform integrations, and monitor metrics and logs from Vault in your Google Cloud Platform. You can check out Google’s documentation for using the Ops Agent for Vault here: https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/install-index
What signals matter?
Vault is a secrets store that can be distributed across multiple instances with a high level of encryption to securely handle data. Our integration collects metrics around the operations executed against the store as well as metrics related to token interactions. There are also audit logs related to the operation executed.
- vault.memory.usage
- This metric depicts the Vault RAM usage. Lower memory usage usually correlates to higher performance. If memory usage gets too high, interruptions, crashes, and data loss are possible.
- Vault.token.lease.count
- This metric is used to verify that leases are properly distributed and there are not more leases attempting access to the vault than expected.
- Operation counts
- Vault.storage.operation.get.count
- Vault.storage.operation.list.count
- Vault.storage.operation.put.count
- Vault.storage.operation.delete.count
- Operation counts are monitored to ensure that operations are completed correctly and that there aren’t any unexpected operations being performed.
All of the above categories can be gathered with the Vault receiver – so let’s get started.
Before you begin
If you don’t already have an Ops Agent with the latest Vault receiver installed, you’ll need to do that first. Check out the Google Cloud Platform Ops Agent documentation for installation methods, including the one-line installer.
Configuring the Vault receiver for Metrics and Logs
Navigate to your Ops Agent configuration file. You’ll find it in the following location:
- /etc/google-cloud-ops-agent/config.yaml (Linux)
Edit the configuration file for Vault metrics as shown below:
metrics:
receivers:
vault:
type: vault
token: <VAULT_TOKEN>
endpoint: 127.0.0.1:8200
service:
pipelines:
vault:
receivers:
- vault
For Logging, add the following in the same yaml config file:
logging:
receivers:
vault_audit:
type: vault_audit
include_paths: [/var/log/vault_audit.log]
service:
pipelines:
vault:
receivers:
- vault_audit
Restart the Ops Agent with the following command:
sudo service google-cloud-ops-agent restart
sleep 30
You can edit the config file for more precise control over your agent behavior, but it is not necessary. Here is a list of the most relevant editable fields that you can edit to adjust your agent:
Metrics:
Field | Required or Optional | Default | Description |
---|---|---|---|
`type` | required | Must be `vault`. |
|
`endpoint` | optional | `localhost:8200` | hostname:port of vault instance to be monitored. |
metrics_path | optional | `/v1/sys/metrics` | the path for metrics collection. |
`token` | optional | Token used for authentication. |
|
`scheme` | optional | `http` | The scheme to use for the request. |
`collection_interval` | optional | A [time.Duration](https://pkg.go.dev/time#ParseDuration) value, such as `30s` or `5m`. |
|
`insecure` | optional | TRUE | Signals whether to use a secure TLS connection or not. If insecure is true TLS will not be enabled. |
`insecure_skip_verify` | optional | FALSE | Whether to skip verifying the certificate or not. A false value of insecure_skip_verify will not be used if insecure is true as the connection will not use TLS at all. |
`cert_file` | optional | Path to the TLS cert to use for mTLS required connections. |
|
`key_file` | optional | Path to the TLS key to use for mTLS required connections. |
|
`ca_file` | optional | Path to the CA cert. As a client this verifies the server certificate. If empty, use system root CA. |
Logs:
Field | Default | Description |
---|---|---|
`type` | required | Must be `vault_audit`. |
include_paths | required | The log files to read. |
`exclude_paths` | `[]` | Log files to exclude (if `include_paths` contains a glob or directory). |
Viewing the metrics collected
If you followed the steps detailed above, the following Vault metrics will now be delivered to your preferred destination.
List of metrics collected:
Prefix: workload
Name | Type | Unit | Attributes | Description |
---|---|---|---|---|
vault.core.request.count | gauge | {requests} | The number of requests handled by the Vault core. |
|
vault.core.leader.duration | gauge | ms | The average amount of time a core was the leader in high availability mode. |
|
vault.token.lease.count | gauge | {tokens} | The number of tokens that are leased for eventual expiration. |
|
vault.token.count | cumulative | {tokens} | namespace, cluster | The number of tokens created. |
vault.token.revoke.time | gauge | ms | The average time taken to revoke a token. |
|
vault.token.renew.time | gauge | ms | The average time taken to renew a token. |
|
vault.audit.request.failed | gauge | {requests} | The number of audit log requests that have failed. |
|
vault.audit.response.failed | gauge | {responses} | The number of audit log responses that have failed. |
|
vault.memory.usage | gauge | bytes | The amount of memory used by Vault. |
|
vault.storage.operation.put.time | cumulative | ms | storage | The duration of put operations executed against the storage backend. |
vault.storage.operation.delete.time | cumulative | ms | storage | The duration of delete operations executed against the storage backend. |
vault.storage.operation.list.time | cumulative | ms | storage | The duration of list operations executed against the storage backend. |
vault.storage.operation.get.time | cumulative | ms | storage | The duration of get operations executed against the storage backend. |
vault.storage.operation.put.count | cumulative | operations | storage | The count of put operations executed against the storage backend. |
vault.storage.operation.delete.count | cumulative | operations | storage | The count of delete operations executed against the storage backend. |
vault.storage.operation.list.count | cumulative | operations | storage | The count of list operations executed against the storage backend. |
vault.storage.operation.get.count | cumulative | operations | storage | The count of get operations executed against the storage backend. |
observIQ’s monitoring technology is a game changer for organizations that care about performance and efficiency. If you’re using Vault, our solutions can make a significant difference in your infrastructure monitoring. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com.