The BindPlane Agent is a flexible tool that can be run as an agent, an aggregator, or both. As an agent the collector will be running on the same host it's collecting telemetry from, while an aggregator will collect telemetry from other agents and forward the data on to their final destination.
Here are a few of the reasons you might want to consider inserting Aggregators into your pipelines:
- Collection nodes do not have access to the final destination
- Limiting credentials for the final destination to a small subset, the aggregators, of systems to reduce vulnerability of exposure
- If the aggregators are on a cloud instance, ability to use instance level credentials - such as for Google Cloud destinations
- Offloading data processing (parsing) to the aggregators to prevent overloading of collection nodes
- Apply universal parsing to data streams coming from disparate devices
- Provide correlation to those data streams, such as trace sampling
Today we will examine these reasons, and some possible architectures for implementing aggregators. We will also review how they appear in BindPlane.
- BindPlane OP
- Several BindPlane agents used as edge collectors
- One or more BindPlane agents used as aggregator(s)
- A final destination, such as Google Cloud Logging/Monitoring/Trace
- (Optional) Load balancer for the aggregators if using more than one
As a starting point, I am using the OpenTelemetry microservices demo running on GKE. In addition to this, I’ve created both a deployment and a daemonset of the BindPlane agent in the same cluster. For configuration, I have the generic OTel collector from the demo forwarding all data to my BindPlane daemonset. This is a sort of aggregation in and of itself, but is only needed because I am not managing the embedded collector from the demo with BindPlane.
The daemonset configuration consists of a Kubernetes Container source, a Kubernetes Kubelet Source, and an OTLP source. The OTLP source is the endpoint for the data from the embedded generic collector.
Moving To Single Node Aggregation Model
We’re going to add a BindPlane agent into the pipeline as an aggregator, here’s what the final architecture will look like.
In order to convert this setup from direct to destination to an aggregation model, I start by copying the configuration.
Once I’ve created the duplicate configuration, I edit it to remove all of the processors and replace all of the destinations with a single OTLP destination. The processor removal isn’t required, however I am doing it to illustrate the ability to offload such processing from the edge nodes to the aggregator(s). Typically, aggregation nodes are dedicated systems that are well provisioned and do nothing else. Due to those higher resources that are dedicated entirely to the aggregation agent, it is often desirable to perform all processing on them. This has the added benefit of simplifying the configurations present on the edge nodes.
In the above screenshots, we are exporting to a single aggregator on the IP 10.128.15.205. This aggregator is configured with an OTLP source, the destinations previously configured on the pods, and also the processors that we removed from the pods.
Using this model, we have successfully offloaded both credentials and processors from the edge nodes. This reduces our vulnerability of credential exposure by having them present on only a single system. It also reduces the workload on the k8s pods of our edge nodes.
For simplicity and brevity, I showed the configuration of a single node aggregator in this section. However, I did not actually apply these configurations and start the flow of data. I will show data flow at the end of the entire blog.
Moving To Multi-node Aggregation Model
The multi-node aggregation model is essentially the same as a single node, with the exception of adding a load balancer and more nodes running the aggregation configuration.
Moving to this model can be done directly from edge node, or from single aggregation node models. Since I wanted to show both models in this blog, I am moving from the previously demonstrated aggregation model.
The aggregator configuration does not need any changes. It just needs to be applied to one or more additional nodes. In my case, I have a 3 node set. In front of them sits a load balancer forwarding port 4317, the GRPC OTLP port.
A single minor change does need to be made to the edge configuration. This will replace the IP of the single node, 10.128.15.205, with the ip of the load balancer, 10.128.15.208.
Verifying Data Flow
Now that we have everything set up and running, we can check one of our aggregator nodes to validate that data is actually flowing.
From the above screenshot, we can see that telemetry is flowing through our pipeline. We can toggle between logs, metrics, and trace to validate we’re seeing all three signals. For a final validation, we could also check our destinations. I’ll skip that today, as it has been covered in several previous posts.
Now that we have shifted final destinations and our data processing to an aggregator set, we could also add additional processing to the aggregator configuration.
Any time a destination change is needed, it will only affect these few nodes. The new configuration for such a change could be rolled out very fast.
Additional data inputs could also be added to this configuration, for direct to aggregator sources such as syslog, raw tcp logs and metrics, and native OTLP trace instrumented applications. This sort of change would further offload work from your edge nodes.
Today we’ve examined two simple architectures, and discussed ways to enhance them. However, other architectures for aggregation exist. Touching on these briefly, with the two we examined today at the top of the list, we have:
- Single node aggregator - ideal for small environments with a limited number of edge nodes
- Multi-node load balanced aggregator set - scalable, and ideal for most enterprise environments
- Multi-layer, multi-node aggregator sets - For very large enterprise environments
- Has an aggregator set per data center, region, or other division point
- These initial aggregator sets will perform data processing offloading
- The destination for these aggregators will be a final load balanced aggregator set that performs the transmission to the final destination(s)
- Ideally, in this large of an environment the final aggregator set will be distributed across multiple locations, and an intelligent load balancer will sit in front directing traffic to the closest healthy nodes.
- Offers the most redundancy and data safety
- Traffic director aggregator sets - For data segregation
- This could be a multi-layer aggregation
- An initial aggregator set figures out where traffic belongs, and forwards that traffic
- Each directed traffic destination can go directly to the final destination, or to a destination aggregator set in the multi-layer style above as appropriate for the volume of traffic
There are likely other setups which I have not considered or thought of, but these are the ones that we see most frequently.
BindPlane provides users with a powerful data management environment, and aggregators are one of the most important tools in the arsenal. As seen today, there are many ways in which aggregators can help protect your credentials, correlate, process, and route your data. Aggregators provide much needed flexibility to your data pipeline; especially when combined with other tools, creativity, and intelligent deployment strategies.