Agent Sizing and Scaling

Agent

When the collector is running as an agent, you must be mindful of resource consumption in order to avoid starving other services. Generally, agent collectors consume very little resources because they are handling the telemetry of an individual system. If the system has significant telemetry volume (usually in the form of logs or traces), you can reference this table as a starting point for agent system requirements.

Telemetry ThroughputLogs / secondCoresMemory (GB)
13 MiB/m5000.252
25 MiB/m1,0000.52
125 MiB/m5,00014
250 MiB/m10,00028
500 MiB/m20,000416
1024 MiB/m40,000832

Aggregator

Aggregator collectors receive telemetry over the network. Pairing them with a load balancer is recommended in order to provide fault tolerance and the ability to scale horizontally. Horizontal scaling is preferable because it provides fault tolerance and can eliminate exporter bottlenecks.

Aggregator best practices:

  • Minimum two collectors behind a load balancer
  • Minimum 2 cores per collector
  • Minimum 8GB memory per collector
  • 60GB usable space for persistent queue per collector

When deciding how many collectors your workload requires, take the expected throughput or log rate and use this table as a starting point. The table assumes that each collector has two CPU cores and 8GB of memory.

Telemetry ThroughputCollectors
250 MiB/m2
500 MiB/m3
1 GiB/m5
2 GiB/m10
4 GiB/m20

Is it important to over provision your collector fleet in order to provide fault tolerance. If one or more collector systems fail or are brought offline for maintenance, the remaining collectors must have enough available capacity to handle the telemetry throughput.

When dealing with a fixed number of collectors, you can scale their CPU and memory vertically in order to increase throughput. See agent sizing table at the beginning of this page.