Exploring AIOps? A Couple of Things to Think About

If you haven’t been in a cave for the past couple of years, then you’ve heard by now that artificial intelligence (AI) and machine learning (ML) are poised to completely transform … everything. Banking, shopping, medicine, supply chain, fish farming — you name it, and chances are you can Google a blog right now that goes on about how much that field is about to be completely transformed by the rising tide of AI.

The IT practice is hardly immune to this, which makes sense because new technologies in IT are in many ways fueling this trend. We even have a term for it: AIOps. Many IT shops are looking seriously at AIOps as a way to reduce workload on overextended staff as IT monitoring environments get more and more complex. Maybe you work for one of them! If you do, you need to know that AIOps absolutely has the potential to make you a smarter, more effective IT practitioner. But there are a couple of things you need to keep in mind as you walk down this path.

First, though, here’s a quick primer on AIOps for the uninitiated. Gartner defines it thusly:

AIOps platforms utilize big data, modern machine learning and other advanced analytics technologies to directly and indirectly enhance IT operations (monitoring, automation and service desk) functions with proactive, personal and dynamic insight.

OK then!

If you’ve done any reading about AI or ML lately, then you know that the core technology is a trained model based on learned data, which can then be used to infer conclusions about new data. So if you’re training a model to recognize cats from photos, you show it a bunch of cats as well as a bunch of not-cats and hope that by the time you’re done you can show it a new photo and it’ll draw the right conclusion about it. The effectiveness of this model will rely (obviously) on the quality of the data being used to train it.

Most IT infrastructure monitoring (ITIM) platforms have relied on two key sources of data: performance metric data and infrastructure metadata. The metadata gives context to the signals coming from your infrastructure (growing latencies, failed network links, etc.). Lately, monitoring platforms have begun relying on data from application monitoring tools, real-time events and notifications, and ad hoc resource discovery. Combined with all the new tools needed to absorb new data sources, this can make an IT practitioner’s job even more complicated. This is why it remains important to have context around major events.

AIOps tools claim to simplify this wave of data by sorting through the noise to highlight the key events that are important. The trouble with most AIOps platforms today is that they don’t look at much right now besides raw events coming in from other tools. Raw event feeds lack the infrastructure context behind them and can include a lot of extraneous data if they aren’t filtered or classified. There can also be challenges ensuring the data isn’t already stale when it arrives. If the data coming into an AIOps tool is low quality, you can bet that the correlations it finds and conclusions it draws will also suffer — garbage in, garbage out.

The other thing to remember about AI models is that they are best at recognizing situations they’ve seen before. A model trained on cats can’t recognize dogs or airplanes. This means that even when operating at their best, AIOps tools are only helpful with incidents that have the exact same signature as ones that have cropped up in the past. The reality is this is actually pretty rare — if you’re like most IT practitioners, you spend most of your firefighting time troubleshooting new issues, not repeat occurrences. In fact, EMA found that only 27 percent of incoming IT incidents on average are caused by recurring events. How many outages can you endure before your AIOps tool learns to predict when the next one is coming?

AIOps tools are destined to be important weapons in any IT practitioner’s arsenal as they work to reduce complexity and increase efficiency in their jobs, but they will always be an augmentation to more robust platforms with domain knowledge. You’ll need a strong ITIM platform as the foundational element providing this knowledge to predict and eliminate IT outages. The best ITIM solutions will smoothly integrate with AIOps tools to give you the best chance at a smooth day at the office and a work-free weekend.

If you’re interested in learning how advanced IT shops are leveraging the latest ITIM platforms and AIOps, consider attending GalaxZ18.