Recently I had a chance to try using Cisco’s Data Center Network Manager (DCNM) software in anger. I must confess that sometimes anger
was the right word, but at other times it definitely made me smile. Based on the state of the documentation it’s clear that there are a couple of areas where very few people have spent time digging in (if they had, the same errors wouldn’t be in the documentation for at least 5 releases of DCNM), so on that basis I’m using this post—and more to follow—to document some of the fun things I have discovered along the way. For reference, I am running DCNM version 10, so there have been nine previous versions of DCNM in which the behavior can be perfected, and I gather that version 10 is a big step up from version 9.
To put my testing in context, I have a specific FabricPath Leaf-Spine topology already designed, and I am only using the aspects of DCNM pertinent to my particular needs for an Ethernet LAN fabric. I say this up front because I know that I am not using all of DCNM’s functionality, and perhaps I’m missing out on some of the fabric automation features and cable plan capabilities which, for some, will make DCNM more powerful. My use case has me looking for Zero Touch Provisioning (ZTP) and configuration management of a FabricPath Ethernet fabric.
DCNM
Zero Touch Provisioning
Cisco’s DCNM is a tool to help with the deployment and management of Cisco Ethernet fabrics on Nexus switches (at least, the non-ACI Nexus switches). DCNM can enabled Power On Auto Provisioning (POAP) for the switches by providing a DHCP server and a download location for the POAP script, software images and configurations. POAP occurs when a Nexus switch boots without a startup config stored; the switch goes through steps along these lines:
- Get DHCP address. The DHCP response should include a DHCP option to point to a download location for the POAP script.
- Run the POAP script which is provided by Cisco and written in Python.
- The POAP script will download a software image and a configuration file as configured by the user in DCNM based on the device’s serial number.
- The POAP script triggers the software install then on completion stages the configuration to be applied on next boot.
- The switch reloads using the new code install and the downloaded configuration is applied.
In short, the idea is to make deploying a Cisco fabric—and the devices that make it up—as painless as possible.
Did POAP Work?
It did! I had the list of serial numbers matched to devices in my leaf/spine fabric architecture, uploaded the relevant code images to DCNM, uploaded initial configurations (including mgmt0 addressing and default route) to DCNM, then told DCNM which image and configuration files should be offered to each device with a specific serial number. Incredibly, It Just Worked.
I have 16 devices deployed so far; two Fabric Edge (leaf), four spine switches and ten access (leaf) switches, and I did not have to manually install software on any of them. I cannot fully express how much time and heartache this has saved me.
Here’s the cunning part: If any device should fail in the future and we have to RMA (i.e. replace) the device under our support contract, I can simply add the serial number of the replacement device into DCNM, point it to our current standard image and give it the failed device’s last known configuration (which, handily, is available within DCNM). When the new device is cabled in and powered up, it will automatically load the correct software image and configure itself as a direct replacement for the failed unit. So now we have not only ZTP but also Zero Touch Replacement (which I just made up, but I’m sure you get the idea).
Topology
DCNM automatically generates a topology diagram based on the discovered devices, and connects them using neighbor relationships generated from discovered CDP neighbors.
At this point an important lesson is learned about having Cisco devices running CDP on their management ports when those are connected to a non-Cisco management network. As you likely know, CDP packets will be passed on at layer 2 by a non-Cisco switch. As a consequence, DCNM was convinced that all the switches connected to one another directly via the management port. This is, without question, a little silly. My solution:
- Enable LLDP (which the non-Cisco network can understand)
- Disable CDP on the management port
It seems awfully odd that DCNM has reached version 10 without anybody either tweaking the code to ignore management ports or to offer a toggle option to ignoring/include them, because topologically speaking there’s no value to drawing a connection based on layer 2 management adjacencies on the same diagram as all the revenue port fabric links.
Licensing
Well of course you need a license. DCNM will function for a grace period after which it stops collecting statistics from the devices. Each device will need its own DCNM-LAN or DCNM-LAN-SAN license to be applied and assigned within DCNM, not on the device itself. A pain, yes, but it’s a one-time charge and it costs somewhere in the region of $500 or so per device, depending on your discounts and the exact model numbers. This will probably also be a pain after an RMA, as the license will undoubtedly have to be unassigned so that it can be reassigned. Still, that’s minor beans in the scale of things.
Monitoring
One of the features of DCNM—indeed, one of the things that the license enables—is monitoring of device status and statistics. Here’s an example of the somewhat underwhelming CPU monitoring screen (device names fuzzed out to protect the innocent):
Clicking on a device name brings up a somewhat more useful chart below the table:
The same basic interface is used for Memory and Traffic (i.e. Tx, Rx throughput). If you were hoping to replace your usual network performance management system with DCNM, well, don’t.
DCNM also backs up switch configurations, which is handy if you don’t have it done another way already.
Event Management
DCNM collects events from the switches (e.g. port up/down events) and keeps them available for viewing. There’s not much more to say than that. It’s nice to have if you have absolutely nothing else, but otherwise I’d be very happy to see that functionality removed and programming effort put it to other areas that would improve the manageability of the switches.
Automated Configuration Deployment
One of the things that interested me about DCNM was the ability to deploy changes to the devices in the fabric automatically (and not as part of the POAP process). We are all about automation, right? Right. To that end I’ve been investigating how to configure and manage deployments within DCNM, and evaluating how flexible and powerful the feature is for use in a production network. I will publish a second (separate) post focusing specifically on examples and syntax issues, but in summary I would describe the functionality as being limited and somewhat braindead.
DCNM deployment requires four basic steps:
- selection of a configuration template;
- selection of the devices to execute the changes on;
- entry of values for variables used within the template;
- selection of advanced options and credentials for the change.
There is some intelligence in step 3; in addition to basic data types for the variables (e.g. IP address, number, boolean value, etc.), DCNM recognizes some helpful data types including numerical ranges (useful for a range of VLAN IDs for example), lists of values (e.g. a list of VLAN numbers), and IP ranges. It is possible to iterate over lists and ranges in order to create multiple items within a single execution of the configuration template, and the template syntax has basic if…then…else functionality.
It took a little tweaking before I got confident with the system, but in the end I have been able to deploy configurations with relative ease and save myself from, for example, having to log in to sixteen different devices in order to create a new VLAN in the fabric. This is something that can be done many other ways of course, but when the feature is available for free in the tool and somebody else already spent the time making it work (including pre-change snapshots, error detection, and rollback if needed), it would be rude not to give it a go. For anything beyond fairly straightforward changes however, DCNM would not get my recommendation as a deployment tool. The flexibility is just not there, and if you’re looking towards, say, Ansible / JINJA2 as a comparison, DCNM doesn’t even get a look-in.
Initial Conclusions
Power On Auto Provisioning was a big win for me in terms of time saving and simplicity. The fact that I could take sixteen new switches with varying delivered versions of code on them and have them automatically install the correct code version and configure the management port is truly great. It is possible to replicate this functionality without DCNM, as the POAP python script is readily available, and it’s simple enough to set up a DHCP server and a web server for downloads. On the other hand, isn’t it nicer when somebody else does the hard work for you?
The monitoring in DCNM stands out to me as pointless. Sure, tell me overall device health and color the icon on the topology diagram to match, but beyond that it makes most sense to have a proper performance management tool instead, because this really is not worthy.
The topology map is interesting but in my network at least, the layout is not as logical as one might hope, and to me it’s just a nice visual confirmation of the topology and nothing more.
Configuration deployment is a little clunky in DCNM, but it’s functional. In the short term at least, I’ll use it to push out VLANs to the fabric, but longer term I would see myself moving to a more flexible and generic configuration deployment tool. I worry that the documentation is weak, there isn’t much other documentation on the Internet, and the templating system is pretty limited. However, it’s better than nothing and it handles errors (and rolls back) fairly gracefully. This is probably the second most useful function in DCNM as I see it.
With all that said, would I use DCNM? I plan to, for nothing else if not to make RMA a breeze. I have a few Cisco-based Ethernet fabrics to manage, and I plan to add the other two existing fabrics into DCNM shortly too. Watch for a post coming soon about the configuration deployment templates, the syntax and the limitations.
Nice write up! I too deployed dcnm 10 to test the ability to deploy configurations (or even custom templates) via the API, I found this to be initially challenging because I was forced into deploying dcnm with enhanced fabric function even though I was not going to use those pieces i.e. poap…but I simply wanted the ability to hit the API to configure nexus 5k/7k with lets say a new vlan or network. I tried using lan mode only (not enhanced fabric) but the API was not present. Curious as to your experience with API to deploy configs to your spine/leaf switches? Have you tested making out-of-band changes (ssh to cli) and how is that “seen” by dcnm, will dcnm pick up that out-of-band change or dcnm only know of the last known state for that device?
Appreciate your insight! Thanks!