Monitoring vs Observability

Monitoring vs Observability: What is Reality?

Before we start, I have a confession: I absolutely love Digg (people are still Digging things, right?) errr...Reddit. It actually is my front page to the internet, where I research upgrades for my home lab/VR/other niche hobbies, watch silly videos, ingest low-effort memes, judge if people are ‘AHs’ or not on /r/amitheasshole, and occasionally talk trash to other Redditors about my Michigan-based sports teams. An aside: my 11th-grade AP English teacher wouldn’t have been happy with that run-on sentence. Sorry, Mr. Smith.

But I also love Reddit because it’s a great place to understand the community’s honest feelings about a topic—providing real mild, medium, and hot takes on various subjects when you’re really curious (yes, while acknowledging its echo-chamberiness). For this post, it felt like the perfect place to see how DevOps, SREs, and IT Ops folks think about the terms Monitoring and Observability.

Are Monitoring and Observability the Same Thing?

Often, the terms ‘monitoring’ and ‘observability’ are used interchangeably, and for good reason: both methodologies aim to achieve the same result, more or less: keeping your business-critical systems and applications running efficiently and securely. In fact, depending on your source, each word can be found in the other term’s definition—not confusing at all.

Observe ~= Watch ~= Monitor ~= Observe ~= Watch ~=... well, you get it.

Monitoring vs Observability — Speaking of low-effort memes, I thought I'd make a few.

When Googling for a clear comparison, there are many differing, jargon-rich, scientific descriptions. This also isn’t a surprise, as ‘Observability’ is thought to have been first coined in the 1960s by Rudolf Kalman, a brilliant mathematician and engineer in his famous paper about control theory.

Even as someone working in this space for more than 10 years, I find these comparisons hard to digest at a glance. If I slightly unfocus my eyes (using the technique I picked up as a kid “reading” magic eye books) it almost looks like there may be no difference at all.

monitoring observability stereogram — Either I made this stereogram wrong, or you're not doing it right. Good luck!

Admittedly, as a non-brilliant-engineer, once a definition drifts into ‘internal states’ and ‘external states,’ my regular human brain tells me it’s time to head back over to Reddit and return to my hobbies, memes, and news about new firmware updates (sidebar: I could write a blog series on my general excitement for firmware updates), and finding new ways to tweak the performance of my Plex server that yield no real-world benefits for any of my users.

And now, since I’ve memed myself into browsing Reddit again, let’s see what some of my fellow Redditors think about Monitoring vs. Observability and, you know, actually proceed with this post.

Monitoring vs. Observability: some perspective from Redditors

Unsurprisingly, there’s some sentiment that observability is just a fancy marketing term for monitoring or just another case of semantics. User /u/SuperQue summarized this pretty well (with a bunch of upvotes, by the way):

/u/teivah had a similar-ish take; perhaps Observability is more of a fashion term wrapped around the three key signals/pillars of observability:

But while perusing, /u/Just_Defy described it in a way that I really liked:

If the digg button still existed, I absolutely would have absolutely smashed it. But since it doesn’t, I’ll add my upvote and co-opt this idea for the rest of this blog instead. Thanks /u/Just_Defy!

Now, let’s jump into the why and bring this post home.

Defining Monitoring and Observability

What is Monitoring?

Webster’s dictionary defines love as, err, I mean monitoring as: to watch, keep track of, or usually check for a special purpose.

This definition tracks with how I think about it in DevOps/IT Ops/SRE land:

Monitoring is the act of watching key signals to understand the state of a system or application.

You can monitor metrics.
You can monitor logs.
You can monitor traces.
You can monitor events.
You can monitor profiles.
You can monitor transactions.
You can monitor flim flams, jub jubs, or any new signal that paints a clearer picture of your system's overall state.

Each type of signal (well, maybe with the exception of jubs jubs) offers useful context about the overall state of your system but in most cases, doesn’t include enough information to pain a complete picture on its own.

What is Observability?

Similar to /u/Just_Defy's definition, I like to think about observability this way:

Observability is the ability of a system or application to be easily understood.

This means that your system application needs to expose information to understand what’s going on when it’s running, offline, or somewhere in between—enough to understand the unknown unknowns, enough to breach the 'easy' threshold.

I’d argue that observability doesn’t require a pre-defined set of signals or pillars (googling around, it seems the number of ‘pillars’ of observability may be growing, or there’s an additional set of pillars to stack on the current ones), but rather just information for your team to move quickly and efficiently.

Observability Litmus Test

Much like software quality, it can be difficult to measure and judge whether a system has achieved the ‘observable’ gold star. I think of it more like Agile software development or DevOps. Observability is more of a methodology where you gather signals you think are important and continuously iterate.

It’s also a bit of a gut check—“we’re solving issues in production efficiently.”

I always trust my gut, of course, unless I decide to head to Arby's for lunch.

Observability Criteria

Here are a few criteria that help you figure out if you’re on the path to observability:

You don’t need to deploy any new tools or code to completely understand an incident that occurs.
You’re able to understand failures in a timely manner. If you’re saying “shit shit shit” for more than 2 hours, there’s probably work to do.
You’re able to reason about a system’s state from a centralized location (and not 5 different tools).

Challenges to Implementing Observability

Though observability reemerged more than 5 years ago, there have certainly been challenges preventing teams from realizing its benefits. In the 2024 Observability Pulse Report, the average MTTR actually increased despite the promises of benefits and clarity that an observable system can provide. Non scientifically, I see a few items contributing to this:

Telemetry data is still split across multiple SIEM and observability tools and backends. This slows down correlation/causation analysis and adds time to incident resolution.
Telemetry data is still collected with different agents/collectors in different structures/formats. This makes it more difficult to reason about and derive meaningful insights when it arrives for analysis.
The volume of telemetry data continues to grow, forcing organizations to make hard choices about their data, slowing query times, and generally making it more difficult to manipulate and analyze.

Observability: Brought to you by OpenTelemetry

Though we’re already starting to see references to observability 2.0 or the next generation of observability, personally, I’m not sure I’m quite ready for it. I’d make the case that we’re just now starting to arrive at a point where organizations can implement observable systems and applications with the help of OpenTelemetry.

Related Content: What is OpenTelemetry?

OTel standardizes how data is collected, formatted, and exported - and allows for connecting these signals together with context, all with a single set of tools. This is a critical piece of creating an observable system that didn’t exist before the project's inception.

We're also seeing platforms further their efforts to natively support OpenTelemetry. Honeycomb is a leader here, but Splunk, Google, Grafana, and many more have GA'd native support for OTLP and are progressing with their native support for OTLP and consolidation of tools.

OTel: The Building Block of Telemetry Pipelines

OpenTelemetry also has the added benefit of being the perfect building block for telemetry pipelines. Telemetry pipelines, like BindPlane OP, allow organizations to gather, reduce, and refine all the telemetry required to build an observable system or application - and the controls to make it meaningful.

In fact, there's probably a case to be made that a telemetry pipeline itself may be a pillar of observability, perhaps the most important. More on that thought in a different post, though.

Monitoring vs. Observability: Are they different? Does it matter?

The terms mean different things, but honestly, I don’t think precision in the vernacular really matters all that much, day to day. What matters is that your team understands the terms at a high level, why they're important, and has enough information to keep your systems running and figure out “why” if not.

I suppose I could have opened with this.

If you have any questions about observability, monitoring, OpenTelemtery, or BindPlane, contact our team at info@observiq.com.

Monitoring vs Observability

Monitoring vs Observability: What is Reality?

Are Monitoring and Observability the Same Thing?

Monitoring vs. Observability: some perspective from Redditors