A Primer on MTR - The Traceroute on Steroids

mtr_primer

If you have ever wondered what paths packets take to traverse networks, chances are that you have used or heard of traceroute. traceroute or some variant of it is available on almost all operating systems.

My experience with traceroute began when I was just beginning to learn networking and was eager to determine the distance between a server and my machine. I used it on every server or website that I used to regularly visit, watching in awe as the green texts would fill up my terminal window.

Years passed, and as I progressed in my career, I found myself relying on this tool more often than not. Like many other tools, traceroute and other variants of it also kept evolving.

One of these relatively newer tools which is also my favorite is called MTR or “My TraceRoute” and it is the successor to the traceroute utility and comes pre-installed on Linux. In case somehow your Linux machine does not come with it, install it using:

sudo apt update && sudo apt install mtr

mtr is usually referred to as traceroute on steroids and in this post, we will uncover why.

MTR Advantages

MTR is beneficial to use for several reasons:

Helps troubleshoot connectivity issues by identifying bottlenecks or failures along the route to a particular destination. This can aid in identifying the source of network problems, such as misconfigured routers or oversubscribed network links.
Measures the quality of a network connection, providing information on packet loss, latency, and jitter. This can be useful for network performance monitoring and optimizations (very important).
It is a very versatile tool and can be used in various scenarios and with various protocols. It can be used to trace IPv4 and IPv6 routes, and can also trace routes through ICMP, TCP, or UDP packets.

I would say the most important take away here is the support for tracing with TCP and UDP in addition to ICMP. Why you ask? In some cases, the network devices such as firewalls block the ICMP traffic. But most importantly, almost always ICMP packets are handled by the slow path or control plane rather than the fast path or data plane.

This gets a bit tricky so let’s expand on it by explaining the terms I just threw in the hat.

Control Plane vs Data Plane

The control plane is responsible for managing and maintaining the configuration of the network device and its interactions with other network devices. This includes tasks such as routing protocols, address resolution, and network management. The control plane is often implemented in software and runs on a processor or CPU of the network device. ICMP packets are handled by the control plane.

The data plane, on the other hand, is responsible for forwarding and processing data packets as they pass through the network device. This includes tasks such as packet filtering, forwarding, and queuing. The data plane is often implemented in hardware, such as ASICs or network processors, and runs on specialized hardware in the network device.

Slow Path vs Fast Path

Fast path and slow path are terms that are used to refer to the different processing paths that data packets can take through the data plane.

The slow path, on the other hand, is the processing path that is used for packets that cannot be handled by the fast path. This path is typically implemented in software and is used for tasks such as access control, deep packet inspection, and other tasks that require more complex processing. The slow path is less optimized for performance and may introduce additional latency.

The fast path is the optimized processing path that is designed to handle the majority of packets with minimal processing overhead. This path is optimized for high performance and is typically implemented in hardware.

So, can we conclude that the slow path == control plane and fast path == data plane? Although it is very convincing to say yes, it is not quite correct! But why did I make it appear that way earlier? Because these terms usually (and mistakenly) get interchanged and I needed an excuse to distinguish between them. Nonetheless, I could use the term slower path or control plane and so on.

The point I’m trying to make is that because of how network devices handle traffic, TCP and UDP packets travel a much faster course than ICMP packets.

Depending on how these network devices are set up or how busy they are at any given moment, these devices may simply drop the ICMP packets, rate limit them, or, using traffic engineering, divert them to a different path from which the “regular” data traffic is not sent. You might be wondering why someone would set up a device to behave that way. Keep in mind that a device’s CPU is in charge of handling ICMP traffic, and if a large volume of ICMP packets are directed at that device, such as during a DDoS attack, it may affect the data plane and the important/normal traffic that is handled by the device.

I think by now, you could imagine where I am heading. If we see a 40% drop on a hop in our path or some latency, it does not mean that there is definitely a problem with that network device. It could simply be because the way it is configured to handle the ICMP packets. This is where tracing with TCP or UDP packets becomes valuable.

Ok, let’s get to some examples.

MTR Examples

Let’s have a look at a few example.

Basic Usage

The most basic form; uses ICMP:

mtr example.com

Continues running until you press CTRL-C

The command above depicts an output similar to:

                                            My traceroute  [v0.95]
dev (10.11.12.13) -> example.com (93.184.216.34)                                2023-01-16T22:32:03+0100
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                                                     Packets               Pings
 Host                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. _gateway                           0.0%    27    0.2   0.2   0.1   0.3   0.0
 2. 192.168.143.254                    0.0%    35    0.4   0.4   0.2   0.5   0.1
 3. 10.224.217.254                     0.0%    27    0.4   0.4   0.3   0.4   0.0
 4. 10.224.216.164                     0.0%    27    0.4   0.4   0.4   0.5   0.0
 5. 10.224.225.206                     0.0%    27    0.4   0.4   0.3   0.5   0.0
 6. 10.17.151.112                      0.0%    27    0.7   0.8   0.6   1.1   0.1
 7. 10.73.0.152                        0.0%    27    0.4   0.4   0.3   0.5   0.0
 8. 10.95.33.8                         0.0%    27   58.4  14.1   1.4  88.5  27.3
 9. lon-thw-sbb1-nc5.uk.eu             0.0%    27    3.6   3.7   3.4   4.0   0.1
 10. nyc-ny1-sbb1-8k.nj.us             42.3%    27   77.4  77.5  76.3  78.6   0.7
 11. 10.200.3.133                      0.0%    26   81.5  81.6  80.1  83.1   0.7
 12. core1.lga.edgecastcdn.net         0.0%    26   98.9  80.1  77.5  98.9   6.0
 13. ae-66.core1.nyb.edgecastcdn.net   0.0%    26   80.5  83.6  80.4 119.3   8.6
 14. 93.184.216.34                     0.0%    26   75.7  75.8  75.7  75.9   0.0

We can interpret the output as follows:

The first column shows the hops we traversed to reach example.com
The second column Loss% shows the percentage of packets that were lost at that hop. In this case, all the hops except the 10th hop have a 0% loss, which means that all packets reached the next hop.
The third column Snt shows the number of packets sent to that hop.
The fourth column Last shows the round-trip time (RTT) of the last packet sent to that hop in milliseconds.
The fifth column Avg shows the average RTT of all packets sent to that hop.
The sixth column Best shows the lowest RTT of all packets sent to that hop.
The seventh column Wrst shows the highest or worst RTT of all packets sent to that hop.
The eighth column StDev shows the standard deviation of RTT for all packets sent to that hop.

Trace Using TCP Packets

Now, let’s see what changes if we use TCP by typing the -T flag and specifying the port with -P flag:

mtr -T -P 443 example.com

Outputs:

                                            My traceroute  [v0.95]
dev (10.11.12.13) -> example.com (93.184.216.34)                                2023-01-16T23:07:46+0100
Keys:  Help   Display mode   Restart statistics   Order of fields   quit

 Host                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. _gateway                          0.0%    35    0.3   0.3   0.2   0.5   0.1
 2. 192.168.143.254                   0.0%    35    0.4   0.4   0.2   0.5   0.1
 4. 10.224.217.254                    0.0%    35    0.5   0.5   0.4   0.6   0.1
 5. 10.224.216.166                    0.0%    35    0.5   0.5   0.4   0.7   0.1
    10.224.216.164
 6. 10.224.138.62                     0.0%    35    0.5   0.5   0.4   0.8   0.1
    10.224.225.212
 7. 10.17.151.114                     0.0%    35    1.0 116.4   0.7 3043. 536.6
    10.17.146.2
 8. 10.73.0.230                       0.0%    35    0.5   0.5   0.4   0.6   0.1
    10.73.0.98
 9. 10.95.33.8                        0.0%    34    1.7  37.7   1.1 1011. 173.6
    10.95.33.10
10. lon-thw-sbb1-nc5.uk.eu            0.0%    34    3.7   3.8   3.6   4.6   0.2
    be102.lon-drch-sbb1-nc5.uk.eu
11. nyc-ny1-sbb2-8k.nj.us             0.0%    34  1084. 1437.  71.5 7352. 2184.
    nyc-ny1-sbb1-8k.nj.us
12. 10.200.3.131                      0.0%    34   74.5  76.5  70.0  82.5   3.1
    10.200.3.137
13. de-cix.nyc1.edgecast.com          0.0%    34   87.6  75.2  70.7  88.8   5.2
    core1.lga.edgecastcdn.net
14. ae-65.core1.nyb.edgecastcdn.net   0.0%    34   73.8  78.7  72.6 111.1   7.5
    ae-71.core1.nyb.edgecastcdn.net
15. 93.184.216.34                     2.9%    34   70.5  72.0  70.3  76.1   1.8

A few interesting things happened here! First the nyc-ny1-sbb1-8k.nj.us hop that had a lot of packet loss is now not having any losses. Secondly, we see more than one hop per hop count. This is because some sort of load-balancing (e.g. ECMP) is taking place.

Good. Let’s explore some more options.

Trace Using UDP Packets

The required steps to leverage UDP are quite similar to the TCP example. Just swap the -T flag with lowercase -u:

mtr -u -P 53 1.0.0.1

This can be helpful for tracing DNS or HTTP/3 servers.

More Advanced Usage

Here are some examples that I mostly use myself:

mtr -T -P 443 -w -n -c 10 example.com
- -T: Tells MTR to use TCP packets instead of the default ICMP packets for the trace route.
- -P 443: Specifies that MTR should use TCP port 443 for the trace route.
- -w: Tells MTR to display the results in a wide format, which will show more detailed information about each hop.
- -n: Tells MTR to display IP addresses instead of hostnames in the output.
- -c 10: Tells MTR to send 10 packets to each hop before moving on to the next hop. This can be useful to get a more accurate representation of the network conditions.

mtr -T -P 443 -w -n -c 10 -z example.com

Very similar to the first command but displays the Autonomous System (AS) number alongside each hop by providing the -z flag. e.g.

 <OUTPUT OMITTED FOR BREVITY>

 13.  AS15133  152.195.68.141    0.0%    10   72.7  76.2  72.7  83.5   3.1
        152.195.68.131
     AS15133  152.195.68.131
        152.195.69.131
     AS15133  152.195.69.131
        152.195.69.139
     AS15133  152.195.69.139
 14.  AS15133  93.184.216.34     0.0%    10   73.3  71.5  70.4  74.1   1.3

mtr -T -P 443 -j -n example.com | jq '.report.hubs[].host'

The -j flag instructs MTR to output its data in JSON format and with a tool like jq, we can extract the data we need. In this case I have extracted all the hops.

Conclusion

In this we learned about the mtr command and seized the opportunity to also talk about control plane, data plane, slow path, fast path and how they influence our analysis and observation from the data we gather.