Leveraging AIOps: Debugging Via Visualizations

Last week, we announced new AIOps capabilities we’ve added to our intelligent application and service monitoring platform. In this blog, I’m discussing the rationale and providing some deeper context on the tangible benefits organizations can realize from this.

The intimate relationship that monitoring has with observability creates a paradigm shift that operators of the pre-cloud generations of IT are quickly appreciating. As DevOps becomes mainstream in the enterprise context, the ability to troubleshoot a problem using the same tools that detected the problem is an efficiency demanded by developers, operators and site reliability engineers (SREs). With more roles responsible for operations of key business services, granting access to a larger pool of personalities is more of a requirement now than it was in traditional environments. Simultaneously satisfying the diverse needs of cloud administrators and business owners is not as straightforward as projecting some charts in the NOC. Furthermore, providing complete service context for all of these consumers means the data must be collected in a scalable, query-friendly location. Simply put, more users need access to a much larger, complex data set to keep customer-facing functions healthy.

From Data Collection to Monitoring  to Troubleshooting

One thing all observability interpretations have in common is the need to emit tremendous amounts of raw machine data from today’s distributed applications. While it’s important to note that observability best practices advocate instrumentation at all layers of the application, this data becomes much more valuable once there’s a common place to collect and analyze it in an intelligent manner. The monitoring strategy is then defined by choosing what indicators will automatically generate alerts — but as any operator or software engineer can attest, the biggest challenge is often determining the root cause that triggered the alert. This debugging/troubleshooting phase often consumes the largest portion of mean time to resolution (MTTR), and it is one of the top reasons CIOs and business owners strive to consolidate tools — beyond the sheer costs of the tools themselves.

Here’s an example of how the AIOps capabilities of Zenoss Cloud help you intelligently filter and analyze event and metric data, in order to get closer to root cause.

Visualizations Provide Insight

While developers, operators and definitely SREs need access to the white box and black box information emulating from a combined application-infrastructure stack, reducing mean time to innocence (MTTI) demands the ability to sort through gobs of high-cardinality data. Contrary to what some vendors will suggest, advanced analysis of one or even a couple types of data (such as metrics, model, events, logs, tracing, etc.) will not provide the insights necessary to adequately debug a hybrid cloud application. In reality, service context can only be fully realized with machine data from all sources/types. And to make sense of this data mishmash, applying a combination of subject matter expert (SME) input and machine learning is required. To consume and then utilize this information, users desire visual simplifications where aggregate views lead to drilldowns into areas of investigation. Building these visualizations is far from trivial, and only monitoring systems with inherent perspective of the application-infrastructure model joined with cloud-native scale can provide these insights.

Zenoss Cloud specializes in providing deep intelligence for applications running in hybrid cloud environments as well as those running in legacy on-premises infrastructure. Combining the power of AIOps with our full-stack monitoring provides the flexibility that organizations need to monitor, visualize and debug across the full spectrum of today’s enterprise ecosystems. For more information on how you can utilize Zenoss Cloud to simplify your service operability, contact Zenoss to set up a demo.