Agent Sizing and Scaling
Agent
When the collector is running as an agent, you must be mindful of resource consumption in order to avoid starving other services. Generally, agent collectors consume very little resources because they are handling the telemetry of an individual system. If the system has significant telemetry volume (usually in the form of logs or traces), you can reference this table as a starting point for agent system requirements.
Telemetry Throughput | Logs / second | Cores | Memory (GB) |
---|---|---|---|
13 MiB/m | 500 | 0.25 | 2 |
25 MiB/m | 1,000 | 0.5 | 2 |
125 MiB/m | 5,000 | 1 | 4 |
250 MiB/m | 10,000 | 2 | 8 |
500 MiB/m | 20,000 | 4 | 16 |
1024 MiB/m | 40,000 | 8 | 32 |
Gateway
Gateway collectors receive telemetry over the network. Pairing them with a load balancer is recommended in order to provide fault tolerance and the ability to scale horizontally. Horizontal scaling is preferable because it provides fault tolerance and can eliminate exporter bottlenecks.
Gateway best practices:
- Minimum two collectors behind a load balancer
- Minimum 2 cores per collector
- Minimum 8GB memory per collector
- 60GB usable space for persistent queue per collector
When deciding how many collectors your workload requires, take the expected throughput or log rate and use this table as a starting point. The table assumes that each collector has two CPU cores and 8GB of memory.
Telemetry Throughput | Collectors |
---|---|
250 MiB/m | 2 |
500 MiB/m | 3 |
1 GiB/m | 5 |
2 GiB/m | 10 |
4 GiB/m | 20 |
Is it important to over provision your collector fleet in order to provide fault tolerance. If one or more collector systems fail or are brought offline for maintenance, the remaining collectors must have enough available capacity to handle the telemetry throughput.
When dealing with a fixed number of collectors, you can scale their CPU and memory vertically in order to increase throughput. See agent sizing table at the beginning of this page.