The Joy of SRX – Fun With Redundancy

Juniper 3600

Today’s learning experience was with the Juniper SRX3600, and discovering that sometimes simple things can give your brain a chance to stumble.

Imagine, if you will, an SRX3600 cluster (Active/Passive) connected to the upstream device using Redundant Ethernet (reth) interfaces. Simple enough, so why was I confused for a while? To explain, let’s back up a moment and check out the two technologies involved in my confusion.

SRX Clusters

Running the SRX3600s in a cluster is perhaps a bit of a misnomer, because they actually behave more like a virtual chassis of a sort, only rather than using a virtual chassis identifier (VCID) to distinguish the ports on one chassis from another as the lower-end EX series would, in a cluster the second chassis’ ports simply begin their slot numbering where the first chassis’ numbers stop, as if you have one chassis with twice the number of slots. Of course, the number of slots varies between different SRX models, so Juniper have produced a handy dandy table so you might make some sense of it. So for example on the SRX3600, the FPC could be from 0-12, so the second node uses 13-25.

interface xe-1/0/0   <- a port on node 0 (the first chassis) 
interface xe-14/0/0  <- a port on node 1 (the second chassis)

Amusingly, the higher end EX switches (e.g. 8200 series) use a similar numbering system to the SRX when creating a virtual-chassis. Anyhow, the point is that from a configuration perspective it looks like one big device with lots of slots.

When you log in to the SRX Cluster, the OS makes it obvious whether you are on the Primary or the Standby in the prompt, e.g.:

{primary:node1}
root@srx3600>

So let’s talk about reth interfaces for a moment.

Redundant Ethernet (reth)

Redundant Ethernet ports are pretty much a way to do MLAG from the SRX. Rather than configuring an aggregated ethernet (ae) interface, you configure a reth port, and add members to it from either chassis. Presumably since the goal is redundancy, you want to include ports from both chassis, so using the ports I mentioned above we might make a reth interface using xe-1/0/0 and xe-14/0/0. Physically, they’re the same port on each SRX3600, but logically they’re bound to a single reth interface in the configuration, so you proceed to configure any trunking or subinterfaces necessary on the reth interface.

Brain Stumble

So why did I mentally trip up earlier? Well, I was logged into the primary device in an SRX cluster, but noted that rather than being on the local xe-1/0/0 interface, the traffic to and from the firewall was all going over the reth member port on the other chassis, xe-14/0/0. Cognitive dissonance ahoy! My brain was screaming that this was wrong. Everything was working fine, but my logic was telling me that since I was on the primary SRX, the reth should be using the attached xe-1/0/0 port, and why was it sending traffic over the fabric to xe-14/0/0?

Redundancy Groups, or “Why I Am Stupid”

The answer was actually fairly simple but it was clearly too Friday for me to see it for a while. When you configure a cluster, you also configure a redundancy-group. When you add reth interfaces, you say how many reth interfaces you want, then configure an additional redundancy-group and associate the reth interfaces to that group:

## How many reth ports will you want?
set chassis cluster reth-count 2 

## Create redundancy group for Routing Engines:
set chassis cluster redundancy-group 0 node 0 priority 100
set chassis cluster redundancy-group 0 node 1 priority 50 

## Create redundancy group for reth interfaces
set chassis cluster redundancy-group 1 node 0 priority 50 
set chassis cluster redundancy-group 1 node 1 priority 100

## Associate a reth with a redundancy-group
set interfaces reth0 redundant-ether-options redundancy-group 1

Cunningly, redundancy-group0 controls the status of the Control plane, and in this case redundancy-group1 controls the Data plane (the reth). This isn’t new to me, but I have always configured the data and control plane to be active by default on the same device! In the configuration above, the priorities configured will make the control plane active on node 0 (where I was logged in and issuing commands), but the data plane (i.e. the reth) will be active on node 1. That also means, of course, that the data is being processed by node 1.

If proof were ever needed that Control and Data planes are separated in the SRX, here it is! And thus after shaking my head at myself for taking so long to realize what was going on, I was able to confirm that nothing was actually wrong here. Ironically, I had issued a “show chassis cluster status” command and stared at the output for a while, convincing myself that I must have been misreading it.

Time For a Break

After missing the obvious for 10 minutes, I think it’s time for a break to clear my head! It’s a long weekend here in the USA (Labor Day on Monday), so perhaps that will do it. If you’re in the USA, enjoy the long weekend. If you’re elsewhere and do not have a day off on Monday, then I raise my glass to you and offer my sympathies. Cheers!

2 Comments on The Joy of SRX – Fun With Redundancy

  1. Reth is not MLAG.
    Reth is one or more links from each srx chassis cluster node.
    When you use more than 2 links LACP is used.
    On the switch, each SRX needs a LAG (assuming more than 1 interface per SRX).

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.