Juniper In-Service Software Downgrade (ISSD)

Junos

In which we discover that the “U” in “ISSU” really does mean “Upgrade”.

I was experimenting with ISSU (In-Service Software Upgrade) on the Juniper SRX platform this week. When it works, it’s great; a seamless upgrade of both devices in a cluster, handily allowing the SRX firewalls to share state half way through the process even though the two cluster members are, at that moment, running different versions of Junos.

But what if something breaks after your upgrade and you need to return to the old version? ISSD? I’m afraid not.

Computer Says No

So it turns out that you cannot run ISSU if the version of code you are trying to install is an earlier version than the code currently running. There is, I’m sure, a good reason for this, but oddly the error messages I got did not indicate that this was the problem.

This testing was performed on an SRX5600 cluster, so the slot numbers are 0–5 (node 0) and 6–11 (node 1).

What I Hoped Would Happen

What Should Happen

What Actually Happened

My favorite error message from that lot?

I’ll get right on that. So anyway, if you didn’t notice, the config validation failed because apparently references to slot 9 aren’t valid in a chassis that only has 6 slots. True enough, unless you’re running in a cluster configuration, in which case, uh, yeah it is. And there was no issue with that same configuration during the previous upgrade process, was there?

Juniper documentation (reference below) confirms that:

“The ISSU process is aborted if the Junos OS version specified for installation is a version earlier than the one currently running on the device.”

Good to know.

No-Validate

But John, surely we can just skip the validation that’s claiming to make the upgrade fail, by using the “no-validate” option? Well, you might, but not on the SRX5600 which does not offer that option with ISSU. It seems that this command option is in fact only available on branch SRX for some inexplicable reason (reference below).

How To Downgrade Your SRX Cluster?

A colleague very kindly threw a link in my direction showing Juniper’s recommended way to downgrade the cluster with minimal outage. Sadly it’s not as good as ISSU and your traffic will take a hit, but it’s at least smart enough to avoid a ‘dual active’ situation, which is a high risk when you downgrade a cluster and the two devices (running different code versions) will not talk to each other and thus assume they are both the boss.

In short the solution is to use the regular request system software add <file> no-validate reboot command on each side, but with a number of steps added in oder to isolate each side during the process before finally bringing everything back up. The process is a huge pain (especially compared to ISSU), but worth following if you’re in that situation and need to take as small a hit as possible.

Why Bother Mentioning It?

I’m sharing this because when I searched the intarwebs for this ISSU validation error, there were precious few references explaining what this all meant. So now it’s here, hopefully it will be indexed and the next lucky person to see something like this will understand that the configuration is not the problem here, it’s that you can’t do “ISSD”. And perhaps that’s why ”ISSD” it not an acronym that gets used much.

References

Be the first to comment

Leave a Reply

Your email address will not be published.


*


 

This site uses Akismet to reduce spam. Learn how your comment data is processed.