Sometimes a user with performance issues will proudly present me with a traceroute and point to a particular hop in the network and accuse it of being the problem because of high latency on the link. About 1 time in 1000 they are correct and the link is totally saturated. The other 999 times, well, let me explain.
Traceroute Output
Here’s a typical traceroute I might be sent by a user (IPs and hostnames are altered to protect the innocent):
$ traceroute www-europe traceroute to www-europe (18.9.4.17), 64 hops max, 52 byte packets 1 gateway (57.239.196.133) 11.447 ms 18.371ms 25.057 ms 2 us-atl-edge (137.16.151.202) 13.338 ms 20.070 ms 19.119 ms 3 us-ga-core (57.239.129.37) 103.789 ms 105.998 ms 103.696 ms 4 us-nyc-core (57.239.128.189) 107.601 ms 103.116 ms 103.934 ms 5 us-east-core (57.239.13.42) 103.099 ms 104.215 ms 109.042 ms 6 us-east-bb1 (57.239.111.58) 107.824 ms 104.463 ms 103.482 ms 7 uk-south-bb1 (57.240.117.81) 106.439 ms 111.156 ms 104.761 ms 8 uk-south-core (57.240.117.61) 103.408 ms 104.430 ms 103.277 ms 9 uk-london-core (57.240.132.178) 131.883 ms 104.071 ms 104.161 ms 10 uk-london-edge (99.88.4.133) 104.642 ms 105.685 ms 106.011 ms 11 www-europe (18.9.4.17) 103.465 ms 103.630 ms 104.228 ms
Look!
the user cries, The link from atl-edge to ga-core is clearly all messed up because the latency goes from 20ms to 106ms!
Oh No It Doesn’t
Isn’t it amazing that the link in question apparently adds 90ms of latency, yet the link between hops 6 and 7 (the jump from east coast USA to the United Kingdom) appears to show no latency increase at all? In fact, isn’t it odd that the latency for every hop from 3 onwards is about the same?
I know that many people reading this will already know why this is, but for those who do not (and there’s no shame in that), this is indicative of there being an MPLS network in the path, and the MPLS Provider Edge (PE) is the router at hop 2.
Why?
Remember that one of the benefits of MPLS networks is that the network core (the Provider, or P
routers) doesn’t have to know anything about the routes at the edge. The two things the P routers need to know are 1) where all the other MPLS-capable routers are (usually via OSPF or IS-IS) and 2) where to forward incoming MPLS frames based on the incoming labels. They are relatively dumb switches, and this which allows them to move traffic around faster than a native IP router could. So what’s the problem?
Traceroute relies on sending packets with an incrementing TTL; when the TTL expires, the router on which it expires will usually send back an ICMP message to the sender warning that the TTL expired in transit, and that’s how traceroute finds out about each hop in the network. Looking at the MPLS diagram above, what happens when the TTL expires on a P router? The P routers have no knowledge of the edge networks, so how could it route an ICMP packet back to a source it doesn’t know about? MPLS labels are one-way to the destination and there’s no return path included, so the P router does the only thing it can: it snags the outgoing label it was going to use and creates a new MPLS frame containing the ICMP TTL Expired message, and this frame gets switched all the way to the destination PE router (PE-B in this case).
PE-B receives the frame, looks at the ICMP message within it and looks at the destination address, which is my PC. As a PE router, it knows how to get to my PC (which label to use to send it into the MPLS network again), packages the ICMP packet up inside MPLS and sends it back into the MPLS network.
In other words, any ICMP TTL Expired messages generated within the MPLS network actually flow to the far side of the MPLS network and then back again, which is why they all have a similar TTL, and why in this example all thes TTLs are large (because in this case they would have to cross from US to UK then from the UK to US in order to get back to my PC):
If you’ve not seen this before it can be very confusing. As a result I’ve seen time wasted on troubleshooting links which actually have no problems, all thanks to traceroute.
Side note: Not all MPLS networks will push the incoming packet’s TTL into the MPLS frame, so the TTL will not always expire in the middle of the MPLS network. An MPLS network may therefore be seen as a single hop by the ICMP packet, so insight will not always be available into the internal nodes in an MPLS network.
Nice article. Never knew about such behaviour.
Wouldn’t TTL increment happen only when there are passing hops/routers (aka Broadcast Domains or L3 Boundaries). And typically when TTL expires, that means you’ve reached the max. possible hops on the path for that packet.
Corelating this to your explanation:
1) This means in this case, us-ga-core did not have the hop/route to propagate forward. This is acceptable for a switch. So why would a switch attempt to decrement TTL when it is not crossing broadcast domain?
2) Can’t we simply do a traceroute with a higher TTL in the packet if we see this behaviour?
Hi Hemant. Typically when TTL expires, you’ve reached the maximum possible hops (this is really for loop prevention), yes; but traceroute intentionally sends packets with a low hop count (starting at TTL=1 and incrementing by 1 each time) in order to trigger ICMP TTL Expired messages from routers along the path, thus revealing the path(s) to you in the process.
1) us-ga-core doesn’t have a route back to the source of the expired packet because it’s an MPLS P-router. All P-routers know about is how to get to other P routers and the PE routers. Even then, while it’s a router, it’s running MPLS so it’s “switching” MPLS packets based on the MPLS header rather than the embedded packet’s destination IP.
2) Not sure how this would help. You *need* TTL to expire in order to get the ICMP response. Raising TTL would simply mean you missed finding out about hops along the path.
Dublin traceroute is a modern alternative: https://dublin-traceroute.net/README.md
Spoiler alert: I’m the author of Dublin Traceroute. Specifically for the MPLS tunnel, another way to see it is, graphically, with a RTT chart. For example https://github.com/insomniacslk/python-dublin-traceroute#rtt-chart-per-path . The next couple of posts on the dublin-traceroute blog (coming in a few days) will be particularly about graphical analysis, and statistical analysis with Python and Pandas (a data analysis framework).
Thank you all for your content and discussions