Monitoring Directly in Kubernetes
No matter what you’re using Kubernetes for, visibility into your applications’ performance and activity is a beneficial and often essential undertaking – essential, but colossal, requiring entire teams dedicated to nothing but maintaining deployments, auditing, debugging, and keeping up with compliance. Kubernetes has robust support documentation dedicated exclusively to assisting customers with Monitoring, Logging, and Debugging. Kubernetes is a powerful resource and their instructions on how to monitor, log, audit, and debug are detailed enough for almost anyone, even a kubenewbie, to get the job done. Their documentation also illustrates the inescapable overhead of handling these needs directly in Kubernetes. For many developers, there simply isn’t time.
There are seven lengthy sections on logging and using logs to debug issues in Kubernetes. The step-by-step instructions show you how to set up and monitor logs on various pods with specific audit criteria. The intention is to provide teams with the tools and knowledge they need to debug and maintain their applications directly in Kubernetes. It’s true that logs will often contain all the information necessary to hone in on and correct a bug, but skimming through text lines in search of valuable, actionable information is a headache. It’s a working solution, but not a reasonable one for small teams that need to move fast.
Meeting Observability Needs
In most cases, from solo contractors to massive enterprise deployments, a log management system is a better solution. observIQ supports over 60 logging agents with copy paste installations on Kubernetes. You can version control your agents alongside your applications with git, so maintaining, updating, or reverting doesn’t stop the flow of information. It makes collecting and accessing the information you need super easy. That’s enough to raise a lot of eyebrows, but what really matters is what you can do with the data once you have it, and that is where log management platforms like observIQ thrive.
Building your own tools, implementing open-source tools such as OpenTelemetry, or committing to the cost and overhead of an enterprise log management system as a means to analyze and gain insights from your log data all present massive challenges and trade-offs. The objective is to spend less time and fewer resources solving issues than the issues cost in the first place. observIQ is built to give developers insight into their applications and bring them the information they need to solve problems without breaking the bank. In many cases, it’s even free.
The observIQ Solution
Free log management is way trickier with open source tools. It often involves implementing and tweaking multiple tools to fit your specific needs, and then managing an analytics tool that can handle your data in a valuable way. observIQ is built precisely to address the pains that cause developers to sometimes give up on observability altogether. We give you dashboards full of insightful analytics tools that can be easily adjusted to target your needs. Logs from all of your deployments, applications, and pods are ingested in one place and organized to generate insight. Compliance needs are fully automated, alerts keep you apprised of any issues that need immediate attention in real time, and your team gets unlimited accounts on a single plan. It’s the best way to create piece of mind and set the stage to focus on what matters.
Development cycles are complicated. If you’re on a development team, whether you’re building out a custom application, maintaining and iterating on a growing microservice, or breaking ground on a new platform for a startup, you have your hands full. Log management, though seldom celebrated outside hardcore DevOps and IT circles, is still a well-known instrument among seasoned developers. It is insight into the internal workings of your processes as they are used. If your code was a hopscotch outline, the logs would build a map of where the players’ feet actually touched down. That can be valuable in many ways, but most development teams have never made any investment into log management because it seems too costly an endeavor.
Log management comes with trade-offs. It requires overhead, maintenance, and of course, money. The question for many developers is: when does the value of a log management solution exceed the cost of implementing and maintaining it? That answer is different for every team, and constantly changing as log management solutions evolve.
So, when is the right time to invest in a log management solution?
Old Barriers Into Log Management
The average software development team in the United States has no more than nine people. Usually, team size is closer to three. Big log management solutions are more often built for tech giants, ingesting logs from applications with massive deployments and heavy, international traffic like Netflix, Facebook, or even Amazon. Netflix, the smallest of the bunch, has over eighty developers. They are all looking to collect petabytes of information that will fuel machine learning, generate sweeping market insights, or allow them to optimize some miniscule process that has a significant impact at massive scale.
The old log management solutions like Splunk and Datadog built deep, complex infrastructures to service these titanic use cases. It makes sense that the firstcomers went after the whales. Like many enterprise-level early movers, it takes teams of trained professionals to maintain and generate continuous value from these systems. The worst part for anyone trying to generate value from the data, though, was that before OpenTelemetry there was no standardized protocol for measuring and collecting data. It’s like building the Eiffel Tower from blueprints with no units of measurement. Ingesting the logs wasn’t the only barrier to insight – making the data workable, and then making it valuable also added to the overhead.
Smaller log management solutions that targeted more typical businesses are catered towards IT and DevOps professionals. They’re most useful for security and compliance, which is certainly a benefit that can serve developers as well, but not often something that small development teams are thinking about.
What does all of that mean for the rest of the tech world? The more common three-to-nine person teams might as well shrug it off and move on. It makes sense, given the traditional cost and overhead required to make log management a valuable endeavor, that most developers don’t think about it at all. If they do think about it, it’s as a future luxury that they can use to generate data some day when they can afford to hire a dedicated DevOps team. Very few developers peek over that perceived barrier to entry and consider what value log management could add to their development cycle today. More ought to.
The Value Log Management Can Add Today
Our applications don’t always function as intended. Ever had code with no errors still run poorly? Spent hours scrolling through rainbow stacks of someone else’s syntax looking for the misplaced variable? Or, the most common bane of the development cycle – ever spent an afternoon sifting through a stack of bug reports after releasing a deployment that tested perfectly before rolling out? Log management might make all the difference.
In every example, one common theme rings true – hunting down and solving unexpected problems in any development cycle is a massive pain, and by nature one that often goes unaccounted for during planning. The goal is simple – find the cause, get in, fix it, get out, and deploy the patch so that you can continue working on the new stuff that’s piled up in your team’s sprint manager. Without log management, solving these common problems can feel like going on a treasure hunt without a map. You might have a general idea of where to look, and you will eventually find what you’re looking for, but it’s going to be a challenge.
Log management is valuable during development because it gives your team the insight needed to solve problems efficiently. In the past, that efficiency was outweighed by the relative inefficiency of existing log management systems. With low-overhead, full-service log management solutions like observIQ, generating that critical insight is a matter of a few clicks.
When To Invest in a Log Management Solution
It’s a truism in the development world that developers do not commit to a product or solution until it can solve a problem that they are experiencing at that moment. There are those in the industry that look down on such an imprecise strategy, but in the case of the average development team it is better to move as fast and as light as possible rather than bloat your dev environment with costly checks and guardrails that may solve some unknown future issue. Log management is no different.
Log management lives in the back of the mind for most developers, and that’s okay. The beauty of solutions like observIQ, compared to the old standard, is that they are always ready to go when needed. Build your team and break ground. Build out your stack as efficiently as you can, tying in outside solutions as problems arise. When it happens that you have a problem that you need more visibility to solve, that’s when it’s time to integrate a log management solution. The most important thing is to recognize that problem when it hits. It might not be obvious, but it will happen.
In most cases, that problem occurs sometime shortly after someone from outside the development team interacts with your application, system, or website for the first time. End-users are frustratingly efficient at finding the hiccups in otherwise immaculate code. The potential of log management is to turn that frustration into insight. Every user that stumbled on a problem draws a map to the solution.
The simplest answer to the question of when is – invest in log management when you’re faced with a persistent problem and don’t know where to start. If you think you know where to start, and eventually find that you were on the wrong track, don’t waste another second of your time. Deploy a log management system.
The better answer, and the strategy of the observIQ team, is to keep your dev environment poised to solve problems as they arise – or even before. With user-focused log management platforms like observIQ, deployment takes minutes and is self-maintaining. The system will continue ingesting and analyzing logs in the background, so the insight is already there when you need it.
Use observIQ For Free in the Development Cycle
observIQ only takes minutes to deploy. It’s an obvious solution to come to when visibility is needed. For those who like to be prepared, it offers an even better strategy – free log management. observIQ is completely free to teams shipping fewer than 3GB of logs per day. No payment information is even required. If you go over 3GB in a day, you aren’t charged, and nothing happens to your account. The service simply pauses, and if you don’t want to upgrade, it resumes the next day with a new 3GB cap. observIQ is the perfect log management solution to install early, keep free, and upgrade if needed when visibility becomes crucial.
Purchase decisions often begin with a price check. Log management is no different. Evaluate your budget and narrow down the options that fit to choose the tool that gives you the most for what you pay. As always, cheaper is better as long as the platform doesn’t cut any corners. But with log management, there is a catch – not all tools are transparent with their pricing model. There are often hidden fees, exponential increases in price with usage, or essential features that get locked behind ‘premium’ walls. It helps to know ahead of time what features are essential to your needs, what kind of technical limitations you face, and how much you can afford to spend.
- Access to key features like extended retention, live tail, alerts, and automated parsing comes at a price. If you don’t perfectly understand your needs, look for an application that provides all these within your price range.
- You are charged based on how much log data you ingest and how long you retain them. The more you retain and the longer you retain, the more your prices go up.
- The number of users accessing your log management system matters. Some tools have a pay-per-user model and others have an organization-based user limit.
- If you need to manage logs for multiple organizations, consider the versatility vs pricing of different solutions. Pricing often varies based on how many companies or individual sources you want to manage within a single account.
- There is a data limit on the logs that are ingested into your account based on your plan/ subscription model. If your daily log ingestion volume is high, you may want to invest on a plan that is priced higher, with a higher data limit.
Simple onboarding and superior user experience
As far as cloud apps go, usability is everything! Any application that is not intuitive should not be on your radar for log management since you might as well use open source tools over a convoluted paid option. Here are the UI elements that make a system intuitive:
- Simple navigation options
- Uncluttered user interface that presents critical information up front and enhances the user experience
- Options to perform important actions such as save, search, and filter your logs
- Visually appealing, useful dashboards
- Context sensitive help and support options for in-product support
Helpful customer support
All good cloud application platforms offer helpful support, but what makes the difference is going the extra mile to help you get started using their system, setting everything up just as you envisioned. Send a pre-sales enquiry to your log management system of choice and see how they respond. Test the waters before you make a financial and data commitment to any platform. Always remember, although systems are getting more intuitive by the day, there are times when you may need to speak with the people behind the system with a request, and that experience is a make or break factor.
Your log data could contain confidential information about your network, users, and company information. To protect your knowledge assets, it is critical that you approach log management with a company that has all the necessary compliance formats, regulations, and excellent security in place. If not, when there are audits or compliance concerns, you may have to go through additional hoops to attain the certification. A good log management platform will come with features meeting the following compliance standards:
The observIQ advantage
If you are reading this post, the log management solution that observIQ provides will be of interest to you. observIQ covers all the requisites detailed above and more. Explore our services, if you have any questions, reach out to our friendly customer support team, they are around to assist you with all your requests/questions.
This is a personal story from before I worked at observIQ. I am not a technical person in any professional sense. I have no direct training and my coding experience is limited to front-end web design and some indie game development. Before observIQ, all I knew about log management was that it has something to do with tracking computer performance and behavior, and I associated it mostly with DevOps and the cloud. I never imagined it would play any valuable role in my professional endeavors.
I was working on a chess-based indie game called Chess Heroes. Only three of us developed the game, so time and resources were spread thin. Early on we tested the game’s multiplayer functionality, and noticed that some moves took too long to reach the opposing player – seconds instead of milliseconds, and sometimes minutes. We assumed the problem was somewhere in our Kubernetes cluster.
The front-end of the game is built in Unity, which acts mainly as a visualizer for the actual logic of the game. The rules and moves run on a C# chess engine that we installed on a Kubernetes cluster through DigitalOcean. The Unity instance on each player’s phone is only responsible for sending and receiving moves, and translating the board position it receives from the server on the visual interface, which is just a normal chess board with some power cards and a Hero character image.
Diagnosing an issue like this can be challenging, even seem impossible. There are no error codes to track down because everything thinks it’s working as intended, and dozens of interconnecting components – many built from open-source code that none of us were familiar with. Anything could be causing the delay. Digging through the code was a non-starter. We needed a way to get more visibility into the problem. That’s where, to my surprise, log management came in.
I had heard of the big, enterprise level log management companies like Splunk and Datadog, but our project was small, and we were really only looking to diagnose a small issue, not jump into a log management endeavor that could easily spiral into a larger time sink than the problem itself. A free(ish) option, New Relic, popped up in my search, but they limit free accounts to one user. The Chess Heroes team lives in California, Michigan, and Washington. One account was no good.
observIQ had the answer. Free accounts with unlimited users, up to 3 gigabytes of logs per day, 3 day retention, and no hidden charges whatsoever. It didn’t even ask for a credit card. This sounds like an infomercial now, but trust me, I was not sold at the time. I had never installed a logging agent on anything, let alone in Kubernetes, which I had only just learned how to use. To my sincere relief, installing observIQ was just a matter of copying and pasting an agent into Kubernetes, creating a .yml component and viola – logs started appearing in observIQ’s dashboard.
It was time to play a game and look for the cause of our turn delay. We were looking for anything that could lead us to the source. We made the first move in our dummy game. All the appropriate logs flew across the Live Tail real time log feed. It looked normal, but there was no move delay in the game either. We tried again, hoping to reproduce the delay. It took about seven moves until we encountered the issue. One player move, and the new board position didn’t appear on the other player’s board for several seconds. We jumped back into the logs and began searching for an issue, but there was nothing wrong. Every move, including the delayed one, was sent, processed, recorded, and sent back out to the opponent in less than a hundredth of a second every time.
Log management didn’t show us where the problem was, but it did the next most valuable thing. It showed us where the problem wasn’t. We were previously convinced that the problem had to be on the server. We would have rebuilt the Kubernetes cluster from the ground up before checking the Unity project. Once we knew the server was good to go, we turned to Unity. Thanks to the new insight from observIQ, we knew the server sent new game states flawlessly, so how were they received? Since there were no errors in the code and the moves always displayed (eventually), we knew the board positions arrived and processed properly. No one could think of a good reason why the game took so long to process and display a new board state, so we popped open the hood and looked for something that could be holding the board state back from appearing on the players’ screens.
Early in the game’s construction we had pasted in a free turn-based-multiplayer API from the Unity asset store that was responsible for sending and receiving messages from our server. It had a line buried somewhere in the stack that restricted the game from checking for new board states more than once every two minutes. It made sense, once we found it. The API was intended for a game where turns happened simultaneously and were restricted to 2 minutes, so it was a resource saving technique by the designers of the API. Smart for them, bad for us. In chess, every millisecond matters. We flipped it so that the board state updated any time the server sent a new position, and the problem finally resolved.
As a small development team focussed on a pet project, log management was not on our radar. Before finding observIQ, we all assumed that the cost and overhead of dealing with logs would be more effort than benefit. Yet, when we encountered an issue during development that we couldn’t solve, it was log management that played the most critical role in the diagnosis. It isn’t just for big enterprise organizations. I can honestly say I am a believer in the value of observability for small teams and independent developers.
At observIQ, we pride ourselves on delivering simple log management solutions with powerful functionalities. We’re excited to announce the addition of Live Tail to the observIQ feature suite. Live Tail emulates the terminal experience, giving you the ability to analyze, visualize and debug live – all in a single place. Never worry about the outcome of your deployment. Live Tail lets you troubleshoot, react, and assess issues across all of your deployments in real-time. Watch your logs stream as they are ingested; easily narrow down the results with a simple search and dynamic filter options. Read on for a deep dive into this cool new addition and see how you can make the most of it.
How does Live Tail Work?
- You can view logs as they are ingested. With the ability to stream, pause, or stop streaming logs at the click of a button, you don’t have to be at the edge to respond to an event of concern. No separate terminals and no toggling between interfaces – you do it all within your observIQ account on a single interface.
- You can search and isolate logs of events that interest you such as errors, processing failures, access denials, etc. Use simple Lucene queries or the various filter options to find specific logs. Alternatively, you can have search terms highlighted on logs as they are ingested to isolate and identify logs.
- Scroll up or down and have the play/pause options automatically mirror your actions, with scrolling up pausing the stream and scrolling down continuing to play the stream
- View the rate at which your logs are being ingested and streamed
- Collaborate with your engineering team, allowing them to troubleshoot application, deployment, and production issues without disrupting performance.
- Navigate to Live Tail based on your search and filter options from “Discover” or choose to live tail logs from a specific agent under Fleet.
What can you do with Live Tail?
- Collaborate: As a DevOps engineer managing a deployment, you’ll feel better with the power and autonomy to control your deployment as it happens instead of reacting after a breakdown occurs. Give your engineering, IT, and DevOps teams access to live tail to see how deployments pan out in real time across all your machines.
- Gauge the efficiency of hardware additions to your network through Live Tail.
- If you notice an event that is out of the ordinary occurring at specific times, use live tail to see what’s causing it.
- Get an aggregated view of all the events in your Kubernetes applications, making it easier to narrow down to the root causes for errors in your applications, which could impact performance for the rest of your cluster.
- Collaboration for troubleshooting in real time between an application engineer and a DevOps engineer has never been easier. Both users access a single console and view a single instance of the live stream of logs from the application, without any communication delays or disconnects.
- Isolate application requests from a complete application stack or the entire pipeline
Live Tailing Kubernetes, a quick shoutout
Open Source Tools
At observIQ, we love and run on Kubernetes ourselves. Prior to implementing Live Tail, we utilized several open-source tools to help simplify tailing in Kubernetes. Tailing logs from multiple pods, containers, and deployments can be challenging without one of these tools. Here’s a quick list of tools that we would recommend checking out:
Open Sources Tools
Live Tailing Kubernetes with observIQ
observIQ makes streaming logs from single or multiple deployments, namespaces, containers, pods (and more), incredibly simple with dynamic filters. An example of this would be, for an updated adservice deployment and to trigger a rolling update, with Live Tail you can see that:
- The new replica set is created indicating a successful deployment and the time at which the deployment was completed.
- The new replicaset’s pods are successfully created and started
- The healthchecks (liveness and readiness probes) failed before the pod became “healthy”
- The old replicaset is removed
Doing all this tracking on the command-line would have been cumbersome.
In the screen capture below you can see that while updating checkoutservice the healthchecks failed enough times to trigger a restart of the replicaset, this means that the old replicaset failed to be deleted because the new deployment failed. This is captured very vividly in Live Tail.
In observIQ, Live Tail is for Everyone
Every observIQ user has access to Live Tail. Just as with all other observIQ features, such as built-in Dashboards, Alerts, and Sources, Live Tail is available in all of our plans, free and paid alike. If you’re an existing observIQ, Live Tail is available now. Head over to the ‘Live Tail’ page – you can start streaming your logs immediately. Don’t have an observIQ account yet? Head over to our signup page and sign up for a free trial. If you have any questions along the way, reach out to our well-informed support team, they have answers for all of your log management questions.