Those Pesky Active BGP Sessions

CLI Text

You’re troubleshooting a routing problem; you check BGP and the neighbor shows as active. Great, let’s move on and look somewhere else.

Right now a good proportion of you should be shouting at your screen – and with good reason.

Cisco IOS Basics

When we are first taught about Cisco ACLs, we’re taught the dangers of assuming that adding “no” in front of the command will just remove a single line. For example, pretty much every Cisco engineer out there will know that the end result of these commands:

access-list 1 permit host 1.1.1.1
access-list 1 permit host 2.2.2.2
access-list 1 permit host 3.3.3.3
! 
no access-list 1 permit host 3.3.3.3

…will be that access-list 1 is totally deleted. If you didn’t know that, then thank goodness you do now.

In a similar vein, I’ve recently seen a few people miss something really critical about BGP session status that I think we’re all taught pretty early on (probably in CCNA-level classes) that I just kind of assumed everybody knew about – just like the access-list problem above. However, since this was missed in the heat of troubleshooting – unfortunately probably the time it was most critical not to miss it – I thought I’d share a reminder here.

Active BGP Sessions

The command typically used to check BGP neighbor status is not, as you might expect, “show ip bgp neighbors”. That does work, but it’s a lot of output to slog through, especially if you have a lot of peers. It’s much quicker instead to use “show ip bgp summary” which abbreviates nicely to “sh ip bgp sum”.

So let’s say we’re troubleshooting a routing problem discovered a few minutes ago, we check BGP, and this is what we see:

R1#sh ip bgp sum
BGP router identifier 1.1.1.1, local AS number 1
BGP table version is 9, main routing table version 9

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down  State/PfxRcd
1.1.1.2  4  1       5       4      0   0    0 00:05:07 Active

No problem – the neighbor relationship has been active for the last 5 minutes, so everything should be cool.

NO NO NO NO NO!

Read the output again:

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down  State/PfxRcd
1.1.1.2  4  1       5       4      0   0    0 00:05:07 Active

On the right hand site, the state is showing as ‘Active’. That means it’s configured and it has been trying to connect to the neighbor for the last 5 minutes and 7 seconds, but it has not yet managed to do so. When it does connect, the output is going to change so that rather than telling you the state of the connection, it’s going to tell you how many prefixes were received (and accepted):

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down  State/PfxRcd
1.1.1.2  4  1      11       9     13   0    0 00:01:03        4

You’ll also notice that the TblVer (table version) in the previous output was “0” – also a bad sign. This is simple stuff – if you don’t have a number on the right hand side, the connection is DOWN. Not ‘admin down’ mind you – that looks like this:

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down  State/PfxRcd
1.1.1.2  4  1      11       9      0   0    0 00:00:13 Idle (Admin)

It’s a really simple thing, and I suspect most of us know this well. However, it’s such an easy thing to overlook when you’re stressed because a network is down, another mention can’t hurt, right? Don’t fall into the trap of thinking that “Active” = “Working fine”.

RFC Mumbo Jumbo

You’re probably thinking “Well then, why didn’t Cisco change it to say ‘not working’ then?” And that’s a reasonable question, but the reality is that they are displaying the current State of the BGP Finite State Machine, as defined in RFC4271. Those states are:

  • Idle
  • Connect
  • Active
  • OpenSent
  • OpenConfirm
  • Established

We’ve seen Idle and Active in the command output above, and when a session is Established, Cisco shows the number of prefixes instead. It is possible – but unlikely – to see the other states showing in the command output simply because sessions tend to establish so quickly, you’d be unlikely to catch them at the right moment:

R1#sh ip bgp sum
BGP router identifier 1.1.1.1, local AS number 1
BGP table version is 9, main routing table version 9

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down  State/PfxRcd
1.1.1.2  4  1       5       4      0   0    0 00:00:35 OpenSent

So that’s why it says Active even though it isn’t active in the sense we would hope. Blame the BGP FSM, why don’t you.

Meanwhile, stay vigilant, and pay attention to that sneaky State column 🙂

 

2 Comments on Those Pesky Active BGP Sessions

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.