Daring to Dream: Multivendor Network Configuration Management

Parsing Definition (Source: google.com)

Last year I wrote a script that I have installed for my current client that parses a router configuration archive in order to create a database of IPs configured in the network (I’ll write a post with more detail about what that tool does some time, as I’m curious to see whether it’s something that other people would find useful). I wanted a tool that didn’t reinvent the wheel, and since I had a readily available folder containing all the device configurations, I used what was already there rather than having to build my own tools to extract the configurations as well, or hook into one of the configuration management products.

In maintaining and improving that tool, I’ve stumbled over a number of interesting issues that have led me to think there has to be a better way. And I guarantee I’m not the first person to be thinking this…!

Configuration Parsing

One of the frustrations of parsing text-based configurations is that I have to be able to handle not only multiple vendors (e.g. Cisco, Juniper, f5), and multiple operating system types within those vendors (e.g. IOS, NXOS, “Switching” Junos (Vlans), “Routing” (Bridge Domains) Junos and ScreenOS) but also have to cope when an update to the OS changes the behavior of that device, or the way configurations are stored (e.g. f5 LTM version 10 has a very different configuration format and file structure to LTM version 11). It has in fact been suggested that the word “Parse” is probably a contraction of “Pain in the arse.”

You also start realizing when you parse configurations that the order in which things appear in the configuration can be a huge pain in the rear. Information is not all grouped together, but instead information might be separated from where it would be most helpful. For example if I put an IP interface in a VRF, IOS and Junos do accomplish this in different ways:

Note that while all the interface-relate configuration comes under the interface in IOS, in Junos the interface is mapped to a VRF under the routing-instances hierarchy, and the IP is allocated on the interface itself – there’s a separation of the IP and VRF information, so when you parse the configuration you first have to make a note of which VRF every interfaces resides in, then when you get to that interface’s configuration, you have to remember to cross reference that back to the VRF information you grabbed earlier.

And Your Point Is?

Talking about this to a colleague, I posited that in an ideal world we’d manage to find a single format in which configurations could be stored (or at least delivered to my scripts) so that regardless of the routing device (for example) I would be able to read the configuration in a consistent fashion and process it.

Perhaps more beneficially, if I were to change the hardware on which a configuration had been active, I might be able to redeploy the same configuration on the new hardware even if it wasn’t identical. In other words, the configuration describes what you want the configuration to be, not how to actually implement it. The implementation specifics then become a secondary, though not insignificant problem. This might sound a little bit like the way Puppet works, perhaps? e.g. (and I’m skipping a lot of related material to keep this simple) to ensure that a standard sshd configuration is installed we might have this kind of Puppet configuration:

It’s pretty obvious what this does, but does it give the commands necessary to implement this? No it does not; it simply says to make sure it’s there, owned by root, group set to root, chmod 600. It’s down to the puppet agent on the target OS to actually make this happen, issue the chown, chgrp and chmod commands appropriate to that system. In other words it’s a declarative language (thanks, Ivan!). The implementation details are abstracted to a description of what needs to be done, rather than explain how they are to be done.

Applying This to Networks

Actually, Juniper has taken some steps in that direction, supporting Puppet configurations for interfaces. For example from their netdev_stdlib documentation:

That should at least get the physical interface configured, but services on top of that interface are managed separately. For example, Layer2 properties are defined in netdev_l2_interface:

That’s enough to get an interface configured with tagging (if necessary) and an access vlan or trunk list configured. So what about configuring an IP address on a layer 3 interface? Not so much; that module doesn’t seem to exist. So you can use the netdev modules but only for limited configuration tasks (which also include LAG and VLANs); enough to automate provisioning of a server port, but we’re not going to see entire routers configured by Puppet based on what’s there so far.

Abstraction Advantage

Juniper’s documentation of their netdev_vlan module rather gives a great demonstration of why this kind of abstraction is so useful. In this example, the Puppet config would like to have VLAN500 created on a device called “Green”.

The result after implementation on a device running “Switching” Junos (e.g. an EX series switch) would be this:

The same Puppet configuration implemented on a device running “Routing” Junos (e.g. an MX) would look like this:

Clearly there’s a benefit to abstracting configuration because the requirements are effectively device-agnostic, which has the potential to be a big time saver. Now imagine if you could something similar with the rest of the router configuration too?

Speeding Deployment

Imagine that you want to update your SNMP readwrite community strings every month for security purposes (does anybody use SNMP RW any more?). To do so by hand is a huge pain, so undoubtedly you’d want to script that activity. But most networks are a real mixture of different operating systems, so you can’t just push the same change to every device. For example in an environment with devices running IOS, Junos, and ScreenOS, you’d have to write a script that was able to apply at least three different configuration syntaxes, and – equally importantly – you’d have to either predetermine, or write in to the script an automated determination, of which operating system is running on each target device in order that the correct syntax is applied.

Do we want to be bothered with this though? No. What I’d prefer to be able to do is to write a description of the change I want to make, and let the device figure out how to implement it. For example, maybe in Puppet-style syntax using a fictitious “netdev_snmp” module, I might use a config snippet like this:

I really don’t care how each device implements it; I just want the new SNMP RW string installed and the old string removed. Isn’t that a lot simpler than having to write a script to iterate through?

Monitoring The Changes

I’m not a Puppet expert (as I suspect anybody reading the above will already have guessed), but what I do know is that whether I run Puppet with my made up netdev_snmp module or I write my own script, one essential thing is to know whether or not that change was successful. If it wasn’t successful for some reason, what should I do? Perhaps my script tried to make a change while somebody else had the configuration checked out exclusively to them and my change failed?

Whatever the reason, you need to know about the failure and have a plan for what to do about it. For example should you just alert about the failure? back out the change? try again later? Should you back out all the other similar changes if one fails? And what would the rollback actually look like? Cisco users know (or should know by now) that putting “no” in front of each command is not a recipe for success. Who figures out the rollback?

Incidentally, most “hand-crafted” scripts that I have seen are quite bad at going back and validating that the configuration looks like it should do after the deployed change. The simplest scripting takes the form of “connect to the device and send the following strings”. Better scripting might use expect to connect, login, then send configs. However I rarely see a script that stays logged in and issues commands post-change to see if the change was accepted! Most scripts seem to be “hit and run” affairs.

NETCONF for Configuration Management

NETCONF (RFC6241) takes a pretty good stab at trying to organize network device configurations into a standard scheme, and wraps around it an RPC-based communication mechanism to make the interactions straightforward between client and server. It should perhaps be obvious that there are going to be a number of vendor-specific knobs and functions that aren’t likely to be covered by the default definitions, but NETCONF handles that by allowing vendor-specific extensions to be written. That works fine in my opinion so long as the vast majority of the configuration can is presented and managed under the common schema.

This is cool so long as your chosen vendor supports NETCONF configuration. Does every device in your network support that? And how confident are you that you can write your configuration changes in the right format to use with NETCONF? Here’s an example from Cisco’s NETCONF pages for NXOS, to set the description for Ethernet 2/30 to “Marketing Network”:

Got that? The feeling I get is that you sure as heck do not want to be writing this nonsense by hand. On the other hand, if you change “2/30” to “ge–0/0/0” and fiddle a little bit, maybe you could apply the same change to a Junos device.

My concern would be that this is too specific to Cisco’s NXOS – there’s mention of the running configuration, which wouldn’t apply to Juniper devices. So maybe even beyond NETCONF we need a level of abstraction above this, and certainly one that’s more friendly to mere mortals like us. Like with Puppet, I want to be able to say “configure interface with description ” and let the abstraction layer figure out which exact NETCONF XML elements are required to issue that command successfully on a particular platform. I don’t want to have to busy myself with the implementation specifics when what I actually have are requirements.

Mind you, does your platform support NETCONF fully? What do you do if it does not?

And Theeeeen?

Chinese Foooood

So to summarize, what I have now is a list of things I would ideally like in my network configuration world:

  • Ability to store configurations in a non-vendor-specific format
  • Ability to make OS-independent configuration changes based on abstracted requirements rather than specifying actual OS-dependent implementation steps
  • Ability to know if a change succeeds or fails, and determine how to act in the event of failure
  • A tool that can use whatever features are available on each platform, perhaps in an preference order, e.g. NETCONF > SSH CLI > Out of Band Serial CLI
  • Ability to roll back changes? Since I want to describe my change in a more abstract sense, I’m going to define by requirements as “if something goes wrong, revert to the previous configuration”, so the tool needs to be able to handle implementing that part too.

Sounds like a pretty tall order, so I’m going to leave it there until my next post where I’ll look at one possible solution. Feel free to share your thoughts below; do you agree with my list of requirements for a network configuration tool? Did I miss anything? Am I completely off base? I’d love to hear about it; thanks!

6 Comments on Daring to Dream: Multivendor Network Configuration Management

  1. Hey John. Your ‘journey’ is no doubt a very familiar one to anyone who has been trying to move out of the dark ages and the weeds and automate and orchestrate (although the true desire for that last one is questionable). Or even just do things a little bit smarter and waste a little less time on the mundane grind of network administration.

    I see quite a few tools emerging here and there and I definitely see a great deal of potential now most vendors have an API of some sort. I’ve no idea what solution you plan to write about but I’ve been involved in CPAL (https://github.com/jedelman8/cpal) recently and that, or something like it, has simply amazing potential.

    The best thing for me is that the tools are relatively simple to use and the only limitation is your enthusiasm and time.

    Specifically regarding your requirements I’d not like to reinvent the wheel around the declarative language but I don’t see why you couldn’t implement any of your required features in fairly short order via API. Of course, as is the intention with CPAL, if there is an abstraction layer that supports all your kit and it’s vendors and your can just build your functionality on top of that without worrying about API specifics (along with the configuration specifics right) the all the better to allow you to focus on your needs, not the API documentation!

    • Great response – thanks, Steve! Exactly the lines along which I’m thinking. A consistent API with platform specific implementations beneath it, the details of which I don’t want, nor need to know. I’ll definitely check out CPAL, especially since it’s Mr Edelman!

  2. FYI – the purpose of a modeling language like YANG is to be able to quickly interoperate between things like NETCONF schemas. Cisco has their schema (which you showed) but Juniper and Tail-F also have their own. NETCONF is actually referenced heavily in the YANG RFC, since that was it’s original intent. However it’s used in many other places – for instance as the backbone of the MD-SAL in OpenDaylight.

    • Hi Matt – do you know of any public examples of “how” the YANG modeling language works to “quickly interoperate between things like NETCONF schemas”. Perhaps I should dig into the ODL/MD-SAL; or do you think or is there a good URL/resources you’ve found? Really appreciate your further insights and experience!

Leave a Reply

Your email address will not be published.


*


 

This site uses Akismet to reduce spam. Learn how your comment data is processed.