Live Workshop: Integrate Google SecOps with Bindplane - Join Us on January 29th at 11 AM ET!Sign Up Now

Metrics

BindPlane OP can be configured to expose metrics using OpenTelemetry Protocol or Prometheus. See Monitoring for configuration details.

Key Performance Indicators

Metrics denoted with "KPI" are key performance indicators. BindPlane OP administrators should pay close attention to KPIs to ensure BindPlane is operating normally.

Event Bus

The BindPlane Event Bus is responsible for publishing and consuming messages between BindPlane components. When operating BindPlane in High Availability, the Event Bus is responsible for sharing messages between BindPlane servers. Event Bus metrics can be used to gain visability into the health of the Event Bus. A misbehaving Event Bus will cause issues with configuration rollouts, Live Preview, and Recent Telemetry Snapshots.

NATS

nats.clientProducer (KPI)

Number of times the NATS client producer handler was called. It is important to watch for the error attribute. Consistent producer errors indicate that the event bus is not functioning normally.

  • Type: Counter
  • Attributes
    • error: The error returned by the producer handler. One of "none", "max_payload", "unknown".
    • client_name: NATS client name.
    • cluster_name: NATS cluster name.
    • subject: NATS subject name.

Example:

metric
1bindplane_nats_clientProducer_total{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus"} 26

nats.clientConsumer (KPI)

Number of times the NATS client consumer handler was called. It is important to watch for the error attribute. Consistent consumer errors indicate that the event bus is not functioning normally.

  • Type: Counter
  • Attributes
    • error: The error returned by the consumer handler. One of "none", "max_payload", "unknown".
    • client_name: NATS client name.
    • cluster_name: NATS cluster name.
    • subject: NATS subject name.

Example:

metric
1bindplane_nats_clientConsumer_total{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus"} 41

nats.slowConsumer (KPI)

Number of slow messages processed by the NATS client. See Slow Consumers for more information. When the slow consumer count is greater than 0, event bus messages are being dropped.

  • Type: Counter
  • Attributes
    • client_name: NATS client name.
    • subject: NATS subject name.

Example:

metric
1bindplane_nats_slowconsumer_total{client_name="bindplane-ha-0",subject="bindplane-event-bus"} 2

nats.clientProducerSize

Size of the payload sent by the NATS client.

  • Type: Histogram
  • Attributes
    • client_name
    • cluster_name
    • error: The error returned by the producer handler. One of "none", "max_payload", "unknown".
      • This error will match the error on nats.clientProducer.
    • subject: NATS subject name
  • Buckets:
    • 0
    • 5
    • 10
    • 25
    • 50
    • 76
    • 100
    • 250
    • 500
    • 750
    • 1000
    • 2500
    • 5000
    • 7500
    • 10000

Example:

metric
1bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="0"} 0
2bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="5"} 0
3bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="10"} 0
4bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="25"} 0
5bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="50"} 0
6bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="75"} 2
7bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="100"} 21
8bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="250"} 21
9bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="500"} 21
10bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="750"} 23
11bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="1000"} 23
12bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="2500"} 26
13bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="5000"} 26
14bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="7500"} 26
15bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="10000"} 26
16bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="+Inf"} 26
17bindplane_nats_clientProducerSize_sum{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus"} 7777
18bindplane_nats_clientProducerSize_count{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus"} 26

nats.server.active_peers_count (KPI)

Number of active peers connected to the NATS server. When operating BindPlane in High Availability, all BindPlane instances should report the same number of active peers. e.g. A three node deployment should report 2 active peers.

If active peers is not consistent between BindPlane servers, a configuration or network issue is the likely culprit.

  • Type: Gauge

Example:

metric
1bindplane_nats_server_active_peers_count{} 2

PubSub

pubsub.messages

Number of PubSub messages sent and received.

  • Type: Counter
  • Attributes
    • direction: One of received, sent.

Example:

metric
1bindplane_pubsub_messages_total{direction="received"} 31
2bindplane_pubsub_messages_total{direction="sent"} 22

pubsub.io

Amount of PubSub data sent and received.

  • Type: Counter
  • Attributes
    • direction: One of received, sent.

Example:

metric
1bindplane_pubsub_io_bytes_total{direction="received"} 87190
2bindplane_pubsub_io_bytes_total{direction="sent"} 78696

pubsub.errors (KPI)

Number of PubSub errors. PubSub errors indicates an issue with publishing or consuming messages from Google PubSub.

  • Type: Counter

Example:

metric
1bindplane_pubsub_errors_total{} 2

OpAMP

agent.wait (KPI)

Time spent waiting due to maxConcurrency configuration option. This option prevent too many agents from re-connecting at the same time. During a BindPlane server restart, it is expected that agent.wait will increase temporarily as agent reconnect. If you experience signficant agent.wait time, BindPlane is likely having an issue with agent connections.

High agent.wait times can result in agents appearing "disconnected" and degraded overall performance.

  • Type: Histogram
  • Buckets:
    • 0
    • 5
    • 10
    • 25
    • 50
    • 76
    • 100
    • 250
    • 500
    • 750
    • 1000
    • 2500
    • 5000
    • 7500
    • 10000

Example:

metric
1bindplane_agent_wait_milliseconds_bucket{le="0"} 24
2bindplane_agent_wait_milliseconds_bucket{le="5"} 24
3bindplane_agent_wait_milliseconds_bucket{le="10"} 24
4bindplane_agent_wait_milliseconds_bucket{le="25"} 24
5bindplane_agent_wait_milliseconds_bucket{le="50"} 24
6bindplane_agent_wait_milliseconds_bucket{le="75"} 24
7bindplane_agent_wait_milliseconds_bucket{le="100"} 24
8bindplane_agent_wait_milliseconds_bucket{le="250"} 24
9bindplane_agent_wait_milliseconds_bucket{le="500"} 24
10bindplane_agent_wait_milliseconds_bucket{le="750"} 24
11bindplane_agent_wait_milliseconds_bucket{le="1000"} 24
12bindplane_agent_wait_milliseconds_bucket{le="2500"} 24
13bindplane_agent_wait_milliseconds_bucket{le="5000"} 24
14bindplane_agent_wait_milliseconds_bucket{le="7500"} 24
15bindplane_agent_wait_milliseconds_bucket{le="10000"} 24
16bindplane_agent_wait_milliseconds_bucket{le="+Inf"} 24
17bindplane_agent_wait_milliseconds_sum{} 0
18bindplane_agent_wait_milliseconds_count{} 24

agent.connecting (KPI)

Number of times agents have attempted to connect. The result attribute is critical to understanding if agents are connecting properly.

  • Type: Counter
  • Attributes
    • result
      • conflict: The agent connection was rejected because an agent with the same agent_id is already connected.
      • connected: The agent connected successfully.
      • disconnected: The agent disconnected.
      • error: An error occurred during agent connection.
      • limited: The agent connection was rejected because the agent's account has reached the maximum allowed number of agents.
      • unauthorized: The agent connection was rejected because an account matching the agent's secret-key was not found.

Example:

metric
1bindplane_agent_connecting_total{result="connected"} 7
2bindplane_agent_connecting_total{result="disconnected"} 3

agent.configure

Number of times agents have been configured (push).

  • Type: Counter
  • Attributes
    • result
      • configuring: Successfully set agent status to configuring during agent configuration.
      • disconnected: Failed to push update to agent because it is disconnected.
      • error An error occurred during agent configuration.
      • readonly: Agent does not support remote configuration.

Example:

metric
1bindplane_agent_configure_total{result="configuring"} 3
2bindplane_agent_configure_total{result="error"} 3

agent.verify

Number of times agents have been verified (pull).

  • Type: Counter
  • Attributes
    • result:
      • configuring: Agent status was changed to configuring because it was not running the correct configuration.
      • error An error occurred during agent configuration verification.
      • missing
      • readonly: Agent does not support remote configuration.
      • validated: Agent update skipped because agent already has the correct configuration or does not have a configuration assigned.
      • validated-hash: Agent configuration hash matches the hash previously pushed to the agent.
      • waiting: Agent configuration cannot be validated because the agent is applying the configuration.

Example:

metric
1bindplane_agent_verify_total{result="configuring"} 3
2bindplane_agent_verify_total{result="validated"} 23

agent.upgrade

Number of times agents have been upgraded.

  • Type: Counter
  • Attributes
    • error: An error occurred while upgrading an agent.
    • upgrading: An upgrade request occurred.

Example:

metric
1bindplane_agent_upgrade_total{result="upgrading"} 1

agent.report

Number of agent snapshot requests.

  • Type: Counter
  • Attributes
    • result:
      • disconnected: Snapshot request failed because the agent is disconnected.
      • error: An error occurred while requesting a snapshot from an agent.
      • sent: Snapshot request was successfully sent to the agent.

Example:

metric
1bindplane_agent_report_total{result="sent"} 133
2bindplane_agent_report_total{result="disconnected"} 1
3bindplane_agent_report_total{result="error"} 2

agent_messages

Number of agent messages received.

  • Type: Counter
  • Attributes
    • components: A list of agent components.

Example:

metric
1bindplane_agent_messages_total{components="[\"AgentDescription\",\"EffectiveConfig\",\"RemoteConfigStatus\",\"PackageStatuses\"]"} 17
2bindplane_agent_messages_total{components="[\"CustomMessage\"]"} 564
3bindplane_agent_messages_total{components="[\"EffectiveConfig\",\"RemoteConfigStatus\"]"} 1
4bindplane_agent_messages_total{components="[\"EffectiveConfig\"]"} 2
5bindplane_agent_messages_total{components="[\"RemoteConfigStatus\"]"} 2

agent.heartbeat

Number of agent heartbeats received.

  • Type: Counter
  • Attributes

Example:

metric
1bindplane_agent_heartbeat_total{} 52

connected_agents (KPI)

Number of connected agents. If connected_agents is less than the total number of agents, it is possible some agents are experiencing connectivity issues. This metric should be tracked with the agent.connecting metric.

  • Type: Gauge

custom_messages.processed

Number of custom messages that have been processed.

  • Type: Counter
  • Attributes
    • message_capability: Message capability.
    • message_type: Message type.

Example:

metric
1bindplane_custom_messages_processed_total{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements"} 660

custom_messages.process_time

Time spent processing custom messages.

  • Type: Histogram
  • Attributes
    • message_capability: Message capability.
    • message_type: Message type.
  • Buckets:
    • 0
    • 5
    • 10
    • 25
    • 50
    • 76
    • 100
    • 250
    • 500
    • 750
    • 1000
    • 2500
    • 5000
    • 7500
    • 10000

Example:

metric
1bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="0"} 0
2bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="5"} 740
3bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="10"} 740
4bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="25"} 740
5bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="50"} 740
6bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="75"} 740
7bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="100"} 740
8bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="250"} 740
9bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="500"} 740
10bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="750"} 740
11bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="1000"} 740
12bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="2500"} 740
13bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="5000"} 740
14bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="7500"} 740
15bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="10000"} 740
16bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="+Inf"} 740
17bindplane_custom_messages_process_time_milliseconds_sum{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements"} 52.201175000000056
18bindplane_custom_messages_process_time_milliseconds_count{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements"} 740

throughput_metrics.processed

Total amount of throughput metric datapoints that have been processed and batched.

  • Type: Counter
  • Attributes
    • message_capability: Message capability.
    • message_type: Message type.

Example:

metric
1bindplane_custom_messages_processed_total{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements"} 764

measurements.process_time

warning

This metric is deprecated. Use custom_messages.process_time instead.

Amount of time spent processing measurements.

  • Type: Histogram
  • Attributes
  • Buckets:
    • 0
    • 5
    • 10
    • 25
    • 50
    • 76
    • 100
    • 250
    • 500
    • 750
    • 1000
    • 2500
    • 5000
    • 7500
    • 10000

throughput_metrics.processed

Total amount of throughput metric datapoints that have been processed and batched.

  • Type: Counter

Example:

metric
1bindplane_throughput_metrics_processed_total{} 92

Web Server

requests (KPI)

Number of HTTP requests. The status attribute will include the HTTP status code. It is important to monitor for 4xx and 5xx status codes. An excessive number of 4xx status codes could indicate agent authentication issues. Any number of 5xx status codes is unexpected, and could indicate a configuration issue or bug within BindPlane.

  • Type: Counter
  • Attributes
    • method: Request method.
    • status: Request status.
    • url: Request URL path.

Example:

metric
1bindplane_requests_ratio_total{method="GET",status="200",url="/v1/opamp"} 7
2bindplane_requests_ratio_total{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 212
3bindplane_requests_ratio_total{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 1
4bindplane_requests_ratio_total{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 2

request_duration

Time taken to process the request in seconds.

  • Type: Histogram
  • Attributes
    • method: Request method.
    • status: Request status.
    • url: Request URL path.
  • Unit: Seconds
  • Buckets:
    • 0.1
    • 0.5
    • 1
    • 2
    • 5
    • 10

Example:

metric
1bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="0.1"} 7
2bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="0.5"} 7
3bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="1"} 7
4bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="2"} 7
5bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="5"} 7
6bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="10"} 7
7bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="+Inf"} 7
8bindplane_request_duration_seconds_sum{method="GET",status="200",url="/v1/opamp"} 0.047771097000000005
9bindplane_request_duration_seconds_count{method="GET",status="200",url="/v1/opamp"} 7
10bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="0.1"} 224
11bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="0.5"} 224
12bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="1"} 224
13bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="2"} 224
14bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="5"} 224
15bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="10"} 224
16bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="+Inf"} 224
17bindplane_request_duration_seconds_sum{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 0.027173258000000006
18bindplane_request_duration_seconds_count{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 224
19bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="0.1"} 1
20bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="0.5"} 1
21bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="1"} 1
22bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="2"} 1
23bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="5"} 1
24bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="10"} 1
25bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="+Inf"} 1
26bindplane_request_duration_seconds_sum{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 0.000218934
27bindplane_request_duration_seconds_count{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 1
28bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="0.1"} 2
29bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="0.5"} 2
30bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="1"} 2
31bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="2"} 2
32bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="5"} 2
33bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="10"} 2
34bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="+Inf"} 2
35bindplane_request_duration_seconds_sum{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 0.017959873
36bindplane_request_duration_seconds_count{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 2

request_size

Size of the request received.

  • Type: Histogram
  • Attributes
    • method: Request method.
    • status: Request status.
    • url: Request URL path.
  • Unit: Bytes
  • Buckets:
    • 0
    • 5
    • 10
    • 25
    • 50
    • 76
    • 100
    • 250
    • 500
    • 750
    • 1000
    • 2500
    • 5000
    • 7500
    • 10000

Example:

metric
1bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="0"} 0
2bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="5"} 0
3bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="10"} 0
4bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="25"} 0
5bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="50"} 0
6bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="75"} 0
7bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="100"} 0
8bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="250"} 0
9bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="500"} 7
10bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="750"} 7
11bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="1000"} 7
12bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="2500"} 7
13bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="5000"} 7
14bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="7500"} 7
15bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="10000"} 7
16bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="+Inf"} 7
17bindplane_request_size_sum{method="GET",status="200",url="/v1/opamp"} 2512
18bindplane_request_size_count{method="GET",status="200",url="/v1/opamp"} 7
19bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="0"} 0
20bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="5"} 0
21bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="10"} 0
22bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="25"} 0
23bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="50"} 0
24bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="75"} 0
25bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="100"} 0
26bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="250"} 0
27bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="500"} 0
28bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="750"} 4
29bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="1000"} 228
30bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="2500"} 228
31bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="5000"} 228
32bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="7500"} 228
33bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="10000"} 228
34bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="+Inf"} 228
35bindplane_request_size_sum{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 175092
36bindplane_request_size_count{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 228
37bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="0"} 0
38bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="5"} 0
39bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="10"} 0
40bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="25"} 0
41bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="50"} 0
42bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="75"} 0
43bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="100"} 0
44bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="250"} 0
45bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="500"} 1
46bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="750"} 1
47bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="1000"} 1
48bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="2500"} 1
49bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="5000"} 1
50bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="7500"} 1
51bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="10000"} 1
52bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="+Inf"} 1
53bindplane_request_size_sum{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 410
54bindplane_request_size_count{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 1
55bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="0"} 0
56bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="5"} 0
57bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="10"} 0
58bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="25"} 0
59bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="50"} 0
60bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="75"} 0
61bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="100"} 0
62bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="250"} 0
63bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="500"} 0
64bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="750"} 0
65bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="1000"} 0
66bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="2500"} 0
67bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="5000"} 0
68bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="7500"} 1
69bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="10000"} 1
70bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="+Inf"} 2
71bindplane_request_size_sum{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 17732
72bindplane_request_size_count{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 2

Store

eventbus.latency

Time between sending an event to the event bus and the handler receiving it.

  • Type: Histogram
  • Attributes
    • event: One of received or handled.
    • handler: The component handling the event, one of manager or graphql.
    • type: The event type is always updates. Other values may exist in the future.
  • Buckets:
    • 0
    • 5
    • 10
    • 25
    • 50
    • 76
    • 100
    • 250
    • 500
    • 750
    • 1000
    • 2500
    • 5000
    • 7500
    • 10000

Example:

metric
1bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="0"} 0
2bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="5"} 0
3bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="10"} 0
4bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="25"} 0
5bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="50"} 0
6bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="75"} 4
7bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="100"} 9
8bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="250"} 15
9bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="500"} 15
10bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="750"} 15
11bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="1000"} 15
12bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="2500"} 15
13bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="5000"} 15
14bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="7500"} 15
15bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="10000"} 15
16bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="+Inf"} 15
17bindplane_eventbus_latency_milliseconds_sum{event="handled",handler="graphql",type="updates"} 1508
18bindplane_eventbus_latency_milliseconds_count{event="handled",handler="graphql",type="updates"} 15
19bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="0"} 0
20bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="5"} 0
21bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="10"} 0
22bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="25"} 0
23bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="50"} 0
24bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="75"} 5
25bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="100"} 12
26bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="250"} 22
27bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="500"} 22
28bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="750"} 22
29bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="1000"} 22
30bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="2500"} 22
31bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="5000"} 22
32bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="7500"} 22
33bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="10000"} 22
34bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="+Inf"} 22
35bindplane_eventbus_latency_milliseconds_sum{event="received",handler="graphql",type="updates"} 2312
36bindplane_eventbus_latency_milliseconds_count{event="received",handler="graphql",type="updates"} 22

store.updateRollout

Number of times the store updateRollout method was called.

  • Type: Counter

Example:

metric
1bindplane_store_updateRollout_total{} 5

Postgres

wait_count

Number of times a query had to wait for a connection.

  • Type: Counter

Example:

metric
1bindplane_wait_count_total{} 3

wait_time (KPI)

Total time spent waiting for connections. If wait_time is consistently above 0, it could mean BindPlane's Postgres max connections configuration option is set too low or Postgres is experiencing performance issues.

  • Type: Counter
  • Unit: milliseconds

Example:

metric
1bindplane_wait_time_milliseconds_total{} 0

active_connections (KPI)

Number of open active connections. If active_connections is consistently 100% of the configured max connections, BindPlane may be experiencing performance issues. Generally, BindPlane's max connections should not exceed 100 (default). Increasing max connections might mask an underlying issue and is not recomended.

  • Type: Gauge

Example:

metric
1bindplane_active_connections{} 0

idle_connections

Number of open idle connections.

  • Type: Gauge

Example:

metric
1bindplane_idle_connections{} 0

Prometheus

remote_write.time

Time taken to upload a batch to Prometheus via remote write.

  • Type: Histogram
  • Unit: Milliseconds
  • Buckets:
    • 0
    • 100
    • 200
    • 300
    • 400
    • 500
    • 750
    • 1000
    • 1500
    • 2000
    • 3000
    • 5000

Example:

metric
1bindplane_remote_write_time_milliseconds_bucket{le="0"} 0
2bindplane_remote_write_time_milliseconds_bucket{le="100"} 558
3bindplane_remote_write_time_milliseconds_bucket{le="200"} 558
4bindplane_remote_write_time_milliseconds_bucket{le="300"} 558
5bindplane_remote_write_time_milliseconds_bucket{le="400"} 558
6bindplane_remote_write_time_milliseconds_bucket{le="500"} 558
7bindplane_remote_write_time_milliseconds_bucket{le="750"} 558
8bindplane_remote_write_time_milliseconds_bucket{le="1000"} 558
9bindplane_remote_write_time_milliseconds_bucket{le="1500"} 558
10bindplane_remote_write_time_milliseconds_bucket{le="2000"} 558
11bindplane_remote_write_time_milliseconds_bucket{le="3000"} 558
12bindplane_remote_write_time_milliseconds_bucket{le="5000"} 558
13bindplane_remote_write_time_milliseconds_bucket{le="+Inf"} 558
14bindplane_remote_write_time_milliseconds_sum{} 269.38973300000004
15bindplane_remote_write_time_milliseconds_count{} 558

remote_write.samples.count

Number of samples pushed to the Prometheus instance.

  • Type: Counter

Example:

metric
1bindplane_remote_write_samples_count_total{} 2112

Cache

cache.get

Number of cache requests.

  • Type: Counter
  • Attributes
    • cache: The cache instance
    • result: hit or miss

Example:

metric
1bindplane_cache_get_total{cache="license",result="miss"} 3
2bindplane_cache_get_total{cache="resourceFinder",result="hit"} 39
3bindplane_cache_get_total{cache="resourceFinder",result="miss"} 3
4bindplane_cache_get_total{cache="secretKey",result="hit"} 15
5bindplane_cache_get_total{cache="secretKey",result="miss"} 1