One of the breakout sessions during GalaxZ, presented by Ed Wang, senior manager of automation and monitoring tools at NetApp, was focused on how organizations could improve their IT monitoring ecosystems to keep up with new technology and growing business demands. For many large enterprises, the biggest challenge is matching their infrastructure monitoring capabilities to ever-growing, increasingly complex business requirements. It is not an easy task when your managed IT resources include thousands of servers across multiple data centers, distributed globally. Ed took us through his journey, addressed some of the common pitfalls, and shared how NetApp overcame those challenges with proactive and integrated monitoring through automation.
From Reactive to Proactive IT Monitoring
With increasing complexity in managing distributed hybrid IT infrastructure, it is imperative that the monitoring processes and tools keep pace with technology growth. Limited infrastructure visibility is prevalent on fragmented tools as a result of lower monitoring budgets. For instance, one of the common challenges IT teams face is the sheer number of alerts they get. And in a large IT environment, the traditional alerting mechanism doesn’t help — every minute your customers get impacted, the cost goes up. The number of alerts handled increases gradually as you move up, e.g., from virtualization or converged infrastructure to hybrid IT.
Reactively troubleshooting issues doesn’t help improve productivity. Moving from a “detect and respond” method to a “predict and prevent” mechanism helps you intelligently manage issues and resolve them before user experience is affected. Start by automating events and alerts to deliver actionable information for IT teams. Ed pointed out that monitoring is more about service availability than infrastructure availability — but many IT organizations are facing challenges with traditional tools and processes, which doesn’t help to provide complete service assurance to customers. Automating events and alerts to create actionable alerts helps you to skip a whole bunch of steps before you start troubleshooting. They can be managed by creating service impact, integration and response automation to handle the flurry of alerts from hybrid infrastructure.
Automated & Integrated IT Monitoring Ecosystem
Creating an event monitoring ecosystem that feeds alerts into your incident management system would enable sorting, tracking and the accurate routing of alerts from your IT systems into incident management software through auto-ticketing. For instance, Ed pointed out that NetApp managed storage events specifically by integrating multiple tools, including Zenoss and NetApp OnCommand Unified Manager (OCUM), into the ServiceNow incident management platform. With the mapping table in the incident management platform, you can use auto-response execution to monitor critical triggers and resolve issues proactively.
The auto-response platform leverages Zenoss and allows your application team to create scripts to build into the repository, which helps IT to customize and create new triggers. The NetApp ZenPack is a plug-in module that outlines the business rules for OCUM to pass its monitoring events to Zenoss. Zenoss screens the alerts, deduplicates them, and identifies the critical alerts for auto-ticketing. Along with similar alerting configurations for server virtualization, network and security components in the data center, this integration brings storage alerts into the ecosystem. Regardless of the incident management or event monitoring software being used, any IT organization can benefit from rationalizing the number of actionable alerts and adopting an integrated event monitoring ecosystem.