One Little Thing Can Break All Your Automation

I’ve been doing some work automating A10 Networks load balancers recently, and while testing I discovered a bug which broke my scripts. As hard as I try to code my scripts to cope with both success and failure, I realized that the one thing I can’t do much with is a failure of a key protocol.

A10 Networks Logo

Breaking Badly

So what matters when I’m automating device configurations? Three things come to mind immediately:

The Network / Transport

I need reliable network connectivity, or my automation requests will constantly be timing out or failing. While a script should be able to cope with an occasional network failure, unreliable networks are not fun to work with.

Data Transfer Format

Pick a format (XML, JSON) and use it consistently. I’m happy so long as I can send a request and read a response in a given format. If I send a request using JSON, send a response using JSON. Funnily enough I was troubleshooting WordPress xmlrpc recently and noticed that when there was an error, the XML command being issued received a 404 error followed by, well, you’d hope an XML error response, right? No, because it was an HTTP 404 error, the site was returning the blog search page instead. I think I would have preferred an XML response explaining what the error actually was. Unsurprisingly, the client code using the XMLRPC connection was complaining about an unexpected XML response (correct, since it was HTML).

Consistent API

Create an API that makes sense (I can only dream). Create consistent responses so that I don’t have to “special case” every single response based on the request I make. If a particular response element can be an array, always send back an array, even if it only has a single entry; don’t send a string instead. Wrap responses consistently so that errors and responses can be easily distinguished and extracted. For example, I found this note to self in some code I wrote last year:

Ok, it’s not the end of the world, but it does add an additional step which I really don’t appreciate.

ACOS For Concern

So, my recent discovery with the A10 Networks load balancers, which run the ACOS operating system, was that the encoding of escaped characters within the configuration can mean that ACOS will return invalid JSON in response to my request. For example, imagine that a health check must be configured to request the URL /checkseq\8s1. It’s an unusual URL because it has a backslash in it, but that’s what the server in question asks for, so that’s the health check that’s needed. ACOS understands escaped characters (using a backslash), so to send a \ in the health check, it would have to be entered as \\. Similarly, to send \r\n (carriage return, new line) the health check would contain \\r\\n, and that allows the addition of a custom HTML header as well, which in this example is called “X-Custom-Field” and has a value of 101:

When the health check is used, the GET string is analyzed and the escaped characters resolve to a more normal looking string:

However, when viewing the health monitor’s configuration via the REST API, the same exact process occurs and the JSON for the method is encoded something like this:

When read by the received, the url string is again analyzed for escaped characters and the following are discovered:

Unfortunately, \8 is not a valid escape code, thus the JSON decoding process spews an error at this point. To me this is a failure in the JSON encoder in ACOS; it should take the interpreted string then make it ‘JSON-safe’. By having encoded a string including the invalid character “\8”, ACOS generated invalid JSON. Since my JSON decoder can’t handle invalid JSON, my automation fails on the spot. I don’t know if the query worked or not; I only know it couldn’t be decoded. Highly annoying.

The Workaround

This all started because of a health check URL containing a backslash. The workaround, rather than using “\\” is to URL-encode that backslash as %5C (or similar) in the original health check. However, there’s no way to stop a user creating another “url bomb” in the future, because ACOS will accept \\ in the url string without generating an error.

My 2 Bits

What this really brings home to me is how a breakdown in a key protocol – in this case JSON – can bring automation to its knees. We assume that protocols like TCP will just work, and at this point I think of JSON, largely, in the same way. Scripts rely upon formats like JSON to allow the accurate storage and transport of information, but if the JSON can’t be read by the recipient, the data is lost. In the case of my automation scripts, it brought a workflow to a screeching halt, and it was not possible to get past that point in the process without manually applying a workaround to the health check which was causing problems.

There’s certainly a lesson here about checking results, and raising alarms when an unexpected result shows up. Even a reliable automation script will need some tender loving care at times.

Be the first to comment

Leave a Reply

Your email address will not be published.