The Pain Behind NFV Service Chain Monitoring

In our last article, we talked about managing the infrastructure supporting network function virtualization (NFV). Summary: It’s easy with Zenoss. Now let’s move up a level and focus on monitoring for the network service chains.

Nearly all of our readers have experience working in large data centers. Now, imagine you’re responsible for keeping dozens of geographically distributed data centers running, your team keeps adding new customers, and those customers spin up and down applications constantly. Whenever there’s a problem, there’s no practically no chance you’ll know anything about the customer or the application. Eeek.

That’s the reality of service chain monitoring in an NFV environment.

We’re lucky to have a great partner with the Cisco Network Services Orchestrator (NSO) team, because they’ve made this simple.

The NFV Service Chain Infrastructure

I want to make sure we all understand what a network service chain is, first. There’s an explanatory article at SDX Central that I like, and they have a nice picture of one chain.

Figure 1: Service chain image copyright SDxCentral.com

All of the blue boxes above the NFV platform are elements of a service chain, from the general function broadband network gateway on the left to the rather specific parental control function on the upper right. In a virtualized network function network, these all run as OpenStack virtual machines connected to separate private networks instead of as stand-alone boxes and fixed cables.

There are a ton of advantages to this.

  • You can let your end customers rent a service without having to truck any boxes and physically plug them together. Less expense!
  • If a service is successful, you can spin up lots of copies of it instantly. Sell more!
  • If it’s not successful, you can turn it off and reuse the compute resources for something that is. Less sunk cost!
  • Don’t like the firewall? Just replace the element running in that VM. No vendor tie-in!
  • Video optimizer host has a fan failure? Start it up again on another compute host. Recover fast!

To those of you with experience with Zenoss Control Center, this probably sounds very familiar. Our customers already know they can add more data collection capacity just by starting another copy of the right container. It’s a couple minutes of work to add a new container — so much easier than adding a new server.

What’s different with NFV is the sheer scale of the problem. Depending on application, there might be thousands of tenants each spinning up hundreds of service chains. We can only succeed in monitoring at that scale and change rate with automation, and our partner Cisco has a great tool called Network Services Orchestrator (NSO) for that.

Cisco Network Services Orchestrator (NSO)

Cisco’s NSO is a provisioning system for service chains. It uses standardized models (defined in the ETSI MANO standards) to define service chains and executes those with common communcations components and a generalized device driver to allow multiple vendors’ virtual network devices to be deployed and configured. This collection of features makes it very easy to instantiate a multi-vendor service chain, even across multiple networking domains like VLAN, VXLAN, and MPLS.

The Cisco NSO deployment team created a Zenoss integration driver.  It calls the Zenoss JSON API to create/delete devices, retrieve device information, organize devices into groups, and create impact services based on devices, interfaces, and sub-services.

In practice, that means that every time a new service chain is set up with NSO, Zenoss immediately begins to monitor the chain and all its elements. This is the automation-based IT management that Zenoss’ Mick Nolen wrote about and recently I’ve heard it called Automated Service Assurance.

With NSO-created impact models, an engineer can effectively diagnose whether infrastructure issues are affecting service chains. Zenoss automatically maintains a model of the infrastructure elements supporting the components of each service chain.

The screen shot shows how the product graphically represents the Cisco NFV infrastructure supporting a single-element service chain.

I’ve added annotations showing the compute server and storage server supporting this virtual machine, and we can follow all the way down to the control servers running the OpenStack services.

The Zenoss model helps us by automatically correlating performance and availability events affecting each service chain and providing root-cause analysis of the most likely source.

We can even feed those events back to the orchestrator so it can automate the replacement of a failing chain element with a working one.

Driven by NSO and the Cisco VIM integrations, Zenoss ensures that you’ll know immediately if an NFV infrastructure issue is affecting any of the service chains running in any of the pod locations. Device monitoring, service chain awareness, pod location and infrastructure modeling are all automated.

This is powerful technology!

What Next for Your NFV Service Chain Monitoring?

We’re not done. Just because a set of service chain elements is running doesn’t mean everything is actually working. When you spin up a new service, the first thing you need to do is test that it’s actually delivering service. Does the firewall pass traffic to the load balancer? Is the load balancer routing traffic to the web server? It takes an end-to-end test to validate that, and you need to test both at first setup and periodically after that, too.

Cisco selected Netrounds as their automated service activation testing and end-to-end active service quality monitoring SolutionsPlus Partner. We’re excited to announce that Zenoss and Netrounds are now working together as technology partners!

Our next article will look at how Netrounds and Zenoss work side-by-side to tie service chain validation to service chain infrastructure awareness.  

NFV has been an exciting challenge for Zenoss, and one we’re thrilled to work on. We’d love to address your gnarliest infrastructure monitoring problems, too. Talk to one of our experts and learn how Zenoss can help you!