Complimentary Gartner® Report! 'A CTO's Guide to Open-Source Software: Answering the Top 10 FAQs.'Read more

Agent Sizing and Scaling

Agent

When the collector is running as an agent, you must be mindful of resource consumption in order to avoid starving other services. Generally, agent collectors consume very little resources because they are handling the telemetry of an individual system.

You can reference this table as a starting point for agent system requirements. The resource recommendations do not consider multiple exporters or processors. The addition of processors can impact performance significantly.

Telemetry ThroughputLogs / secondCoresProcess Memory (MB)*
200 MiB/m10,0000.25300
400 MB/m20,0000.5300
1 GB/m50,0001300
2 GB/m100,0002500

* Process memory is the amount of memory the agent is expected to consume. The host system should have enough memory to satisfy the agent and all other services.

Gateway

Gateway collectors receive telemetry over the network. Pairing them with a load balancer is recommended in order to provide fault tolerance and the ability to scale horizontally. Horizontal scaling is preferable because it provides fault tolerance and can eliminate exporter bottlenecks.

Gateway best practices:

  • Minimum two collectors behind a load balancer
  • Minimum 2 cores per collector
  • Minimum 8GB memory per collector
  • 60GB usable space for persistent queue per collector

When deciding how many collectors your workload requires, take the expected throughput or log rate and use this table as a starting point.

The table assumes that each collector has four CPU cores and 16GB of memory. The table does not account for processors. When adding processors, the compute requirements will increase.

Telemetry ThroughputLogs / secondCollectors
5 GB/m250,0002
10 GB/m500,0003
20 GB/m1,000,0005
100 GB/m5,000,00025

It is important to over provision your collector fleet in order to provide fault tolerance. If one or more collector systems fail or are brought offline for maintenance, the remaining collectors must have enough available capacity to handle the telemetry throughput.

When dealing with a fixed number of collectors, you can scale their CPU and memory vertically in order to increase throughput. See agent sizing table at the beginning of this page.