How to monitor Apache Flink with OpenTelemetry


Apache Flink monitoring support is now available in the open source OpenTelemetry collector. You can check out the OpenTelemetry repo here! You can utilize this receiver in conjunction with any OTel collector: including the OpenTelemetry Collector and observIQ’s distribution of the collector.
Below are quick instructions for setting up observIQ’s OpenTelemetry distribution, and shipping Apache Flink telemetry to a popular backend: Google Cloud Ops. You can find out more on observIQ’s GitHub page: https://github.com/observIQ/observiq-otel-collector
What signals matter?
Apache Flink is an open source, unified batch processing and stream processing framework. The Apache Flink collector records 29 unique metrics, so there is a lot of data to pay attention to. Some specific metrics that users find valuable are:
- Uptime and restarts
- Two different metrics that record the duration a job has continued uninterrupted, and the number of full restarts a job has committed, respectively.
- Checkpoints
- A number of metrics monitoring checkpoints can tell you the number of active checkpoints, the number of completed and failed checkpoints, and the duration of ongoing and past checkpoints.
- Memory Usage
- Memory-related metrics are often relevant to monitor. The Apache Flink collector ships metrics that can tell you about total memory usage, both present and over time, mins and maxes, and how the memory is divided between different processes.
All of the above categories can be gathered with the Apache Flink receiver – so let’s get started.
Before you begin
If you don’t already have an OpenTelemetry collector built with the latest Apache Flink receiver installed, you’ll need to do that first. We suggest using the observIQ OpenTelemetry Collector distro that includes the Apache Flink receiver (and many others) and is simple to install with our one-line installer.
Configuring the Apache Flink receiver
Navigate to your OpenTelemetry configuration file. If you’re using the observIQ Collector, you’ll find it in one of the following location:
- /opt/observiq-otel-collector/config.yaml (Linux)
For the observIQ OpenTelemetry Collector, edit the configuration file to include the Apache Flink receiver as shown below:
1receivers:
2 flinkmetrics:
3 endpoint: http://localhost:8081
4 collection_interval: 10s
5
6Processors:
7 nop:
8 # Resourcedetection is used to add a unique (host.name)
9 # to the metric resource(s),... target_key: namespace
10
11exporters:
12 nop:
13 # Add the exporter for your preferred destination(s)
14
15service:
16 pipelines:
17 metrics:
18 receivers: [flinkmetrics]
19 processors: [nop]
20 exporters: [nop]
If you’re using the Google Ops Agent instead, you can find the relevant config file here.
Viewing the metrics collected
If you followed the steps detailed above, the following Apache Flink metrics will now be delivered to your preferred destination.
observIQ’s distribution of the OpenTelemetry collector is a game-changer for companies looking to implement OpenTelemetry standards. The single line installer, seamlessly integrated receivers, exporter, and processor pool make working with this collector simple. Follow this space to keep up with all our future posts and simplified configurations for various sources. For questions, requests, and suggestions, reach out to our support team at support@observIQ.com.
