The Observability Blog

  • BindPlane OP
  • Log Management

Reducing Log Volume with Log-based Metrics

by Josh Williams, Head of Engineering on
April 29, 2023

As the amount of telemetry being collected continues to grow exponentially, businesses are continuously seeking cost-effective ways to monitor and analyze their systems. Data collection and monitoring can be expensive, especially when dealing with large volumes of logs. 

One approach to maintaining visibility while reducing the amount of data collected is through creating log-based metrics. However, traditional platforms that offer this capability often perform the computation at the platform level, which still incurs storage costs for both logs and metrics. 

To address this issue, BindPlane OP performs the metric computation at the edge, allowing users to reduce costs and gain greater control over their data. In this blog post, we’ll explore the concept of log-based metrics, the power of edge-based processing, and the benefits this approach brings to data collection and monitoring.

Understanding Log-based Metrics

Log Count

The first approach to creating log-based metrics involves counting the number of logs over a specified time interval and generating a metric dimensioned by attributes present in those logs. This method allows users to condense large volumes of logs into meaningful and actionable metrics.

Let’s use the example of access logs to illustrate this concept. By counting the logs over an interval, we can create a metric called “http.request.count.” We can then dimension this metric by the different status codes present in the access logs. This would enable users to keep track of the frequency of HTTP requests with specific status codes. For instance, users could set up alerts when the “http.request.count” metric surpasses a certain threshold for 4xx and 5xx status codes, indicating an issue in the system that requires attention.

By utilizing this method, users can reduce the amount of data collected while still maintaining visibility into their systems, leading to more efficient monitoring and quicker issue identification.

Log Extraction

The second approach to creating log-based metrics involves extracting numerical values from logs and using these values to generate metrics. This method allows users to derive deeper insights from their logs by visualizing and analyzing the numerical data contained within them.

Using the access logs example again, we can extract the average duration of a request and create a metric based on this value. This metric would provide users with an understanding of the performance of their system in terms of request durations. By analyzing and visualizing this metric over time, users can identify patterns, trends, and potential bottlenecks within their system.

The need for a cost-effective solution

As we discussed, platforms that offer log-based metrics computation perform these calculations at the platform level. This means that customers are still paying for the logs they send to the platform, as well as for the metrics they create from those logs. As a result, this approach can become quite costly, particularly when dealing with large-scale systems and high volumes of data.

To address this issue, a more cost-effective solution is required—one that enables users to maintain visibility into their systems while reducing data collection costs.

Our solution

BindPlane OP Enterprise includes two different processors that can create log-based metrics at the edge before sending to your destinations. This allows our customers to perform costly calculations from within their observability pipeline, rather than paying for this computation and storage at the platform level.

The first processor, called the Count Telemetry processor, provides the ability to count all three types of telemetry (logs, metrics, and traces). Typically used for counting logs, it can either count the number of logs passing through it, regardless of content, or create individual counts for dimensions specified by the user.

For example, with access logs, we would likely specify the “status_code” and “endpoint” attributes as our dimensions. In contrast, with health check logs, we would likely avoid dimensioning altogether, as these are typically repetitive.

The second processor, called the Extract Metric processor, enables users to extract a numerical value from any field on a log. The resulting metric is highly configurable, allowing users to specify the name, units, and extracted dimensions. 

In the case of access logs, this means we could extract the duration field from a log and convert it into a “request.duration” metric. We could then specify ms as the units and even dimension this metric based on the “endpoint” attribute of the log. Using this setup, we can now easily pinpoint which endpoints have the longest request durations.

Implementing log-based metrics on the edge offers several advantages that can significantly enhance the data collection and monitoring experience for users. To recap, here are the main benefits:

  1. Reduced costs: Users can drop unnecessary logs before they reach the platform and compress the value of that data within a metric.
  2. Enhanced flexibility: Users are empowered to flexibly route the bulk of their logs to more cost effective locations, while sending their metrics to a more comprehensive solution with alerting and analytics.
  3. Increased visualization: Users can visualize and extract value from their logs, even in platforms without robust logging capabilities.

Get started today by installing BindPlane OP and joining our Slack community where we can help you start reducing your telemetry data. By leveraging log-based metrics, you can unlock cost-effective monitoring and make more informed decisions for your business.