We launched the Stanza log agent just over one year ago. Stanza is the result of an uncompromising stance on performance, processing, and configurability for log telemetry. It took mere days for friends and colleagues in the space to raise the obvious objection – there are already so many logging agents, so why spend time on a *new* one?
We also heard from colleagues who had a different take…
“We have Fluentd, Fluentbit, Logstash, Vector, and ultimately OpenTelemetry should be future for sending log data to observability system. So why on earth do we need another open source project to solve this already solved problem? Well another was launched today, have fun :)”
Jonah Kowall, CTO of logz.io via Twitter, July 21, 2020
I get it. I have had the same reaction to other ambitious entries into crowded spaces. Many open source projects choose to reinvent the wheel for little to no benefit, so folks are rightly skeptical when a new project pops up out of nowhere. And while I don’t agree with most of this quote, it’s right about OpenTelemetry. In fact, a few months after this tweet, observIQ and OpenTelemetry announced that Stanza was chosen as the log engine for the OpenTelemetry collector. We couldn’t be more proud of that contribution, and the collaboration benefits the entire industry.
So, a year later, I want to take a moment and share how we decided to start from scratch and built a log agent in a crowded space that ultimately became the agent of choice for some of the largest companies with some of the most demanding performance requirements in the world.
When we started building observIQ, we chose to launch a log analysis platform after years of data pipeline and agent development work for the largest tech companies in the world. We had recently sold a major business unit to VMware and were transitioning to our next phase. We spent over a decade building proprietary agents, integrations, and some of the most advanced telemetry technology in the industry. We used or benchmarked nearly every prominent open source agent on the market. We even tried all the proprietary agents. Despite that, we still weren’t able to solve the most pressing log ingestion challenges of our customers. It always required a compromise. Do we want speed or configurability? Configurability or platform support? Platform support or installation simplicity? Every agent had its strength, but all came with their share of weaknesses.
We started with Fluentd. Fluentd was, and is, a revolutionary agent that showed what can be accomplished with a focus on parsing, community, and a library of plugins for every log source you could imagine. Fluentd has been the measuring stick for all the agents that came after. But the performance issues our customers experienced were well known and a major barrier to adoption. Enough of a barrier that Fluent had launched a separate project to provide a higher performance option. But we found the high performance options to have their own set of limitations.
In early 2020, just before the entire world went into lockdown, a few of us on the team sat together in our office (soon to be abandoned) and came up with all the things we wanted from a log agent. I don’t have the whiteboard photo anymore, but I still have the notes. We left that conference room agreeing that we were tired of compromises. We wanted everything, and everything meant… well, it meant this:
- High performance, low CPU, low memory usage
- This was a blocker for too many of our customers with existing agents. They had massive deployments and the CPU and memory had a material impact on their costs. We needed this solved
- Open source
- No one wanted to use a closed source agent anymore and open source was becoming a requirement across the industry. The team was excited to show what we could do after spending so much time building proprietary agents
- Simple installation with wide ranging platform support and no dependencies
- Installation was often overlooked, but simplicity has always been a core value at observIQ. We never felt good about the complex installation and limited platform support of other agents
- High throughput with multi-threaded support
- A surprise at that time, even high performance agents had major limitations that prevented high log throughput
- Easy to develop plugin framework (expressive, clear, and powerful) and a curated set of core integrations
- Fluentd changed the game with their plugin framework. We wanted to take it a step further and simplify pipeline creation while maintaining the power. Including advanced parsing and manipulation at extremely high performance
- We also recognized that an agent needs integrations. We didn’t want to launch our agent without a core set of high-quality integrations for all the most common log sources
- Alignment with modern telemetry movements
- At the time, we saw where OpenTelemetry was headed and it was exciting to see the collaboration of so many industry veterans. Standardizing on telemetry helped everyone trying to understand their systems and we wanted to be a part of that. Throughout development we wanted to be sure Stanza was compatible with OpenTelemetry and that we were including all the best parts of the developing industry standard
It was an ambitious list. There was an internal debate about whether it was worth it. Maybe it would be easier to compromise and contribute a feature update here or there? We decided to move forward, and set out to build it. After a few long months of development, we launched Stanza.
It was, as intended, the core agent of our new platform at observIQ. To get a hands-on impression, you can take our platform for a spin for free.
Along the way, we continued working with the team at OpenTelemetry and they shared our interest in a logging solution without compromise. So, in January 2021, observIQ and OpenTelemetry announced that Stanza will be the engine behind OpenTelemetry’s log parsing and analysis, contributing to the final piece in the OTel trifecta of traces, metrics, and logs.
Today we’re continuing to innovate in telemetry and analysis with a commitment to bring our Stanza advancements to OpenTelemetry. Including new functionality like automated discovery, remote agent management, and alerting at the edge. We have a lot in store for the community and hope you’re as excited as we are.