Prerequisites

Installation Prerequisites

BindPlane Instance Sizing

BindPlane OP's resource requirements will differ based on the number of managed agents. CPU, Memory, Disk throughput / IOPS, and network consumption will increase as the number of managed agents increases.

Follow this table for CPU, memory, and storage capacity sizing.

Agent CountCPU CoresMemoryRecommended Store
1 - 10024GBbbolt
100 - 1,00028GBbbolt
1,000 - 20,000416GBpostgres
20,000 - 40,000832GBpostgres
40,000 - 80,000*832GBpostgres

Some agent levels require additional configuration. If you expect to manage over 20,000 agents, see the Advanced Configuration section.

* A remote Prometheus installation is recommended when exceeding 40,000 agents. See the Prometheus documentation for more information.

High Availability Sizing

When operating BindPlane in high availability, you can use the instance sizing table to determine the instance size for each BindPlane node in your architecture. You must also consider your fault tolerance requirements and the implications of using a load balancer.

Fault Tolerance

When operating BindPlane in High Availability, you need to consider how many agents you expect a single BindPlane instance to handle.

Take the total number of BindPlane instances, and subtract the maximum number of nodes you expect to become unavailable due to maintenance.

If you have five nodes and expect to bring one node down at a time for maintenance or upgrades, this would leave you with four nodes. If you have 80,000 agents, you need to size your instances so that four of the five nodes can handle 100% of the agents. If we assume that the load balancer will distribute connections evenly, this will mean you need to size the instances at 4 cores and 16GB of memory.

You can use the following table as a baseline for your node sizing.

Agent CountBindPlane NodesFault ToleranceCPU CoresMemory
2,0003128GB
10,00031416GB
100,00051832GB
500,000255832GB

Load Balancer Connection Constraints

Most load balancers will be limited to roughly 65,535 connections per backend instance. When sizing your BindPlane cluster, you must consider how many agents each node will be responsible for during maximum fault tolerance. A good rule of thumb is to not exceed 30,000 agents. This is because each agent will open two connections to BindPlane. One for OpAMP remote management, and one for publishing throughput metrics.

If you have 100,000 agents, a cluster size of three would be insufficient as each node would be responsible for roughly 33,000 agents. 33,000 agents * 2 results in 66,000 TCP connections to each BindPlane instance. This situation gets worse if you bring one node down for maintenance, as each BindPlane instance would become responsible for 50,000 agents, or 100,000 TCP connections.

Storage Back-end Sizing

BindPlane OP has two back-end options. Bolt Store (default) and PostgreSQL. Bolt store is recommended for single instance deployments of BindPlane OP. If you expect to deploy BindPlane on multiple instances, you should use PostgreSQL to avoid migrating from Bolt Store in the future.

Bolt Store

When using the default storage back-end (bbolt), disk throughput and operations per second will increase linearly with the number of managed agents.

To prevent disk performance bottle-necking, ensure that the underlying storage solution can provide enough disk throughput and operations per second. Generally, cloud providers will limit disk performance based on provisioned disk capacity.

Agent CountRead / Write ThroughputRead / Write IOPSStorage Capacity
1 - 10015 MB/s500/s60GB
100 - 1,000150 MB/s5000/s120GB
1,000 - 2,000*300 MB/s10000/s120GB

* It is recommended to use Postgres when exceeding 1,000 agents.

PostgreSQL

When using PostgreSQL storage back-end, performance is generally limited by the number of CPU cores and Memory available. It is recommended that the storage backing Postgres be low latency (SSD) and capable of high throughput.

Agent CountCPU CoresMemory
1 - 40,00028GB
40,000 - 80,000416GB

Network Requirements

Bandwidth

BindPlane OP maintains network connections for the following:

  • Agent Management
  • Agent Throughput Measurements
  • Command line and Web user interfaces

Maximum network throughput scales linearly with the number of connected agents. As a rule of thumb, expect to consume 265B/s for every connected agent, or 2.12Mbps per 1,000 agents.

Firewall

BindPlane OP can run on a local area network and behind a firewall.

BindPlane OP does not need to be reachable from the internet, however, if agents. or users outside of your WAN require access, a VPN or inbound firewall rules must be configured to allow access.

Ports

BindPlane OP listens on port 3001 by default. This port is configurable. See the configuration documentation.

The BindPlane port is used for:

Browsers and API Clients

The firewall must allow HTTP traffic to reach BindPlane OP on the configured port.

Agents

Agents must be able to initiate connections to BindPlane OP for OpAMP (websocket) and throughput measurements (HTTP). BindPlane OP will never initiate connections to the agent. The firewall can be configured to prevent BindPlane OP from reaching the agent networks, however, agent networks must be able to reach BindPlane OP on the configured port.

Agent Updates

BindPlane OP will reach out to github.com/observIQ/bindplane-agent/releases to detect new agent releases. This feature is optional.

You can disable Github polling by setting agentVersions.syncInterval to 0 in your BindPlane configuration.

text
1agentVersions:
2  syncInterval: 0

Advanced Configuration

Max Open Files

BindPlane consumes roughly (2 * agent count) + 1,000 open files. By default, BindPlane's limit is 55,000 open files. When exceeding 20,000 agents per BindPlane instance, it is recommended to increase this value. See the Increase Max Open Files Limit documentation for more information.

Measurement Worker Count

BindPlane publishes agent throughput metrics to Prometheus using a worker pool. By default, there is one worker. When exceeding 40,000 agents, performance can be improved by increasing the worker count.

Agent CountWorker Count
1 - 40,0001
40,000 - 80,0002

See the Advanced Stats Worker Count documentation for more information.