During the ONUG event I met with Dimitri Stiliadis, the Co-Founder & CTO of Nuage Networks, who was excited to tell us about the latest product release, the Virtualized Services Assurance Platform (VSAP).
Virtualized Services Assurance Platform
The history of Nuage products has been fairly straightforward; they began with a virtualized networking solution targeted at data centers. More recently, Nuage Networks announced an expansion of that product into the branch office space. What was missing though, was a good way to monitor and manage the complex environment that was built, from underlay to overlay, from the WAN all the way to from virtual switch. When failures occur, they can be difficult to track down or, worse, you are flooded with alerts and left to figure out which ones are actually important, and which ones are the true root cause. To that end, Nuage Networks’ VSAP aims to provide visibility of the network and event correlation so that you can see what might be affected by a given network event.
VSAP is composed of two main components:
The Nuage Route Monitor uses route protocols to peer with the production network in the data center, backbone, and anywhere else it can. By participating (passively) in the routing protocols, Route Monitor has visibility of the network topology and, more importantly, sees prefix additions, changes, and deletions. From this information, Route Monitor can reconstruct the layer 3 topology, at least within the limits of the visibility of the protocols – BGP for example will not give visibility with an AS, but the AS-Paths can be tracked.
The Route Monitor also uses the Nuage virtual switch to do layer 2 to layer 3 discover and create an association between Virtual Machines, vPorts and IPs as they relate to the layer 3 network topology. Information is also extracted from routers and switches, regardless of vendor, so long as they support standard access mechanisms like SNMP (and hey, maybe NETCONF or something like that in the future?).
The Correlation Engine is well named; it tries to do intelligent analysis of the events it sees and and show the logical inheritance of identified network problems. For example, if a site goes down, you’d expect to receive events as all the VMs behind the site’s edge router are suddenly unavailable. The Correlation Engine (CE) is smart enough, we are told, to realize that the interface down event on the PE router caused the loss of that routing prefix for that site, and can suggest a probably cause for the group of arriving events so that the root cause can be more quickly identified. It doesn’t seem to actually suppress the other events, but at least you can see all the events with the same probably cause and deal with them as one.
The CE also keep historical data which makes it useful in the cases where a report comes in that a VM had network problems, say, between 3 and 4AM yesterday. I’m a big fan of historical data being able to provide a snapshot of the usual, or previous, state, and that’s what this software does. The down side to this sea of unicorn tears? It seems that you need to buy it in conjunction with Nuage’s virtualization solution as well, as that’s where it extracts the useful VM endpoint information. It’s also not drilling down to layer 2 end points, but relies on the idea that since we’re likely using an overlay like VLAN, everything will be routed down to the host level in any case.
While discussing VXLAN, I should add that the Nuage solution is also able to identify loss of host availability within the VXLAN overlay as well as in the underlay routing.
Warnings are really based on availability right now; there isn’t any interface utilization monitoring – yet, at least.
Does It Work?
Good question. Based on the demos we saw, it seemed to, but I’d certainly want to run it up in a lab environment and do further testing to see how effective it really is, and how good the root cause analysis is for a real network. If it does everything that’s claimed, in many ways VSAP is a powerful product that could actually be sold on its own. It seems to me that the topology awareness and monitoring of both overlay and underlay in addition to smart event correlation could be a real help to many companies out there.
In short, I like this. I’m a little excited by it, in fact. Naturally, my thought process leads me to want to see a product that does not relay on the Nuage VSP product and becomes a more general network monitoring tool, because dang, this this is hooking together some neat technologies. One to watch, for sure.
I attended ONUG Spring 2015 as a delegate at the invitation of Gestalt IT, who ran this Tech Field Day Extra! event. The events are funded by the sponsoring vendors who “buy” time to talk to the delegates, and that money in turn funds my travel, accommodation and food while I am there. I am not paid to attend this event (I take vacation from work and do it on my own time), and I am not obliged to blog, tweet, or otherwise publicize any sponsor of the event. Further, when I do choose to publish content related to the event, I do so at my own discretion, and the content and opinions expressed are mine alone without any interference from any other parties.
Please see my general disclosures page for more information.