This week a colleague of mine (also a script junkie) was writing a change script to create a VLAN. In this case he was adding the VLAN to a Cisco FabricPath domain, and he observed that this would really be something better accomplished with a script, as it would reduce silly mistakes like missing one of the switches, or accidental inconsistencies between them. This is on our minds because he has just been working on a way to deploy identical firewall changes to multiple clusters at the same time so that the configurations don’t get out of sync.
And so I found myself working on a script to deploy a VLAN on the list of switches that make up the FabricPath, uh, fabric. For speed I am using my old faithful Perl, and because I’m forward-looking or something, I decided that I would not use any form of expect, but instead would configure the switches using NETCONF. That should be easy enough, right?
NETCONF XML RPC Syntax
I’ve complained in a previous post about Cisco’s XML format versus Juniper’s approach. I’d like to retain that complaint because while I can see the logic, mostly, of what Cisco is doing – and in many senses it’s more true to the configuration hierarchy than Juniper’s approach – it’s a huge pain in the rear to actually figure out the necessary XML, which becomes a big turn off.
Those XSDs, Tho’
The XSD (XML Scheme Definition) is the document that defines the XML instruction and response format that the NETCONF device will handle. If you want to look at it as an analog to SNMP, it’s a little bit like the MIB – it defines the structure of the commands, the data types, what’s mandatory and what’s not, whether there can be multiple entries of a given type in the output, and so forth. It’s also supremely hierarchical in that you can define an element’s data type as being a custom data type that you define lower. In IOS terms it’s a little like trying to reverse engineer a QoS configuration where each step involves looking up a reference to something else, which in turn references another thing, and so on. Lemme try and ’splain.
I want to configure a VLAN, so I look at the XSD and find the configure element (dark red). What we see is that the configure element has a sub-element called __XML__MODE__exec_configure (dark green).
That element is not defined here though; instead you’re told that it is of type __XML__MODE__exec_configure_type (yellow).
So now we go look up that type. Follow the yellow arrow and we get to the type definition. Within this type there are elements that may be present (I’ve cut it down slightly for brevity), one of which is perfect for my task to add a VLAN, and that’s vlan. How do I define a VLAN using the vlan element? Well, we note that a vlan is defined as being of type vlan_type_Cmd_vlan_create_delete. Follow the light green arrow and we find another type definition. I’ve cut the diagram off there because really it’s enough, but trust me when I tell you that this isn’t the end of it. When I finally finished digging through the hell that is the XSD, I ended up with XML to create a VLAN that looks like this:
<?xml version="1.0"?> <nc:rpc message-id="$messageid" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" xmlns="http://www.cisco.com/nxos:1.0:if_manager"> <nc:edit-config> <nc:target> <nc:running/> </nc:target> <nc:config> <configure> <__XML__MODE__exec_configure> <vlan> <vlan-id-create-delete> <__XML__PARAM_value>$vlan</__XML__PARAM_value> <__XML__MODE_vlan> <name> <vlan-name>$vlanDescription</vlan-name> </name> </__XML__MODE_vlan> </vlan-id-create-delete> </vlan> </__XML__MODE__exec_configure> </configure> </nc:config> </nc:edit-config> </nc:rpc> ]]>]]>
I included the necessary code to add a description while we’re at it, but you can safely assume that it was just as much fun to track down as the rest of it was. I have variables in place for the values for the VLAN number:
…and the VLAN name:
My irritation is that this is just a whole mouthful of XML in order to do something that feels like it should be far more simple.
NETCONF vs Expect
Is this better than using expect? Well yes, I think so. If it’s successful, I can check for a response that tells me the command was ok, thus was accepted. It’s fast, too. I wrote a little test script that follows roughly this logic:
- Check that the VLAN is not already in use
- Create the VLAN / name
- Check that the VLAN is now in use (i.e. was created successfully)
Because I either want to deploy this to all devices or none of them, each step is performed on all the devices in sequence, before moving to the next step. So for example if any one device reports that the VLAN is in use, the entire script stops there; or if VLAN creation fails on any device, it will be rolled back on any devices that had been successful so far. The script output is simple:
[john@ubuntu ~]$ ./createvlan.pl -u admin -v 55 -n "Test_VLAN" *** VLAN Tool v0.2 - Jul 2, 2014 *** User: admin VLAN: 55 Name: Test_VLAN -> Checking that VLAN 55 is available on all hosts: :) eth-sw1.home.mynet: YES :) eth-sw2.home.mynet: YES -> Deploying VLAN 55: :) eth-sw1.home.mynet: OK :) eth-sw2.home.mynet: OK -> Checking that VLAN 55 has been created on all hosts: :) eth-sw1.home.mynet: YES :) eth-sw2.home.mynet: YES It's a good day! VLAN 55 was successfully deployed to the switches.
I am not keeping NETCONF sessions open between steps (perhaps I should for speed), but even so, each NETCONF connection is taking roughly half a second, so this script completes the VLAN creation on two devices in just over 6 seconds. That isn’t bad.
Configuring Mode Fabricpath
Yeah, well this is where I’m faltering a little right now. In the XSD for NXOS 6.0.2, I cannot find a specific element to allow me to set fabricpath mode for a VLAN. Fabricpath is in the XSD – I can configure fabricpath itself, and I can monitor fabricpath, but can I set it on a VLAN? Well, not that I have yet found, no. The rather nasty workaround I have to use is this:
<?xml version="1.0" encoding="utf-8"?> <nc:rpc xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" xmlns:nxos="http://www.cisco.com/nxos:1.0" message-id="$messageid"> <nxos:exec-command> <nxos:cmd> configure terminal ; vlan $vlan ; mode fabricpath </nxos:cmd> </nxos:exec-command> </nc:rpc> ]]>]]>
Yes that’s right, I have to resort to using XML to send a set of commands that I would like (effectively) typed at the command line. Heck, I could do that with an expect script! Worse, when you think about it, issuing this command would also create the vlan if it weren’t already created, so everything I did earlier to create and name the VLAN could probably more simply be replaced with:
<?xml version="1.0" encoding="utf-8"?> <nc:rpc xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" xmlns:nxos="http://www.cisco.com/nxos:1.0" message-id="$messageid"> <nxos:exec-command> <nxos:cmd> configure terminal ; vlan $vlan ; name $vlanDescription ; mode fabricpath </nxos:cmd> </nxos:exec-command> </nc:rpc> ]]>]]>
This feels like, I don’t know, what’s the exact opposite of screen scraping? This is like running Embedded Event Manager (EEM) scripts to configure the router when an event occurs – effectively but somehow ugly. I view NETCONF as being more than just a way to pipe CLI commands. Maybe the XML is there but I don’t know it, or perhaps it’s available in a later version of NXOS. Still, for the 6.0.2 switches that I have, this is a disappointment to say the least. I feel like I’m cheating.
As I worked my way through this script, I knew that sanity checking was going to be important before plowing forward with changes. That’s why the script checks before running that you aren’t duplicating an existing VLAN, and why it checks afterwards that that the deployment was truly successful. When I added the ‘mode fabricpath’ command, it struck me that I should also check that fabricpath is enabled on the destination switch before I issued that command. Those of you who read Greg Ferro’s recent post “Scripting Does Not Scale For Network Automation” over on Etherealmind.com may be laughing right now, specifically at this quote:
“You keep adding validation and data sanity checks every time you find a problem.”
Yes, that. There’s always one more sanity check to add. I already sanity check the input to the script to ensure that, say, the VLAN id is numeric, is between 2 and 4096, and doesn’t touch the reserved VLANs around the 1000 level. The VLAN name gets stripped down for invalid characters, trimmed to length (<=32 characters) and so on. Then I’m checking the devices before I configure anything, and I’m double-checking afterwards. And now here I am adding another step to confirm that a feature is deployed before I issue a command reliant on this feature. When does it all stop?!
If I want to see if the fabricpath feature-set is installed on a switch, I can issue the command “show feature-set” at the CLI, and it will tell me whether FP is installed or not. Sadly in the XSD, again I cannot find a direct way to query this information so I am reduced to issuing a CLI command again, just like with the mode fabricpath issue:
<?xml version="1.0" encoding="utf-8"?> <nc:rpc xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0" xmlns:nxos="http://www.cisco.com/nxos:1.0" message-id="$messageid"> <nxos:exec-command> <nxos:cmd> show feature-set </nxos:cmd> </nxos:exec-command> </nc:rpc> ]]>]]>
Sadly for some reason, this command elicits an empty data set returned in response. The element <nc:data> is the one that contains the command output – and it’s unstructured because it’s pretty much a screen scrape put back into that element – and comes back totally empty. Maybe I’m using it wrong? So I tried with show feature instead of show feature-set, and was rewarded with the full output I would get if I issued the command at the CLI. Why is this? I don’t know yet, and I’ll let you know if I find out. What’s clear for the moment is that I can’t test fabricpath availability. And that sucks.
NETCONF for the Win?
Maybe. I still rate NETCONF over screen-scraping and expect – it’s more predictable. The little oddities like those I mentioned above drive me crazy though. I’m trying to do this script using what should be a better process, and I’m being rewarded with what I guess I already knew a little, which is an incomplete implementation biting me in the rear. That should probably lead to the question “Is NETCONF ready for primetime?”. My answer is that yes, absolutely NETCONF is ready for prime time; but unless the network host OS can fully map its functions to an XML RPC request, that OS is not ready for prime time.
Vendors – it’s time. Get it finished and make sure every CLI and configuration function is matched with an equivalent XML RPC command. There’s no excuse at this point.