9 minutes
You Aren’t Doing A Good Job - A Story on TCP Middleboxes
A Bit of a Backstory
During the many years that I have been privileged working at many different companies and through those, with other ones, I would say the most critical one was a multi-data center company that was handling millions of requests per day and I was in charge of the perimeter network and overall data center network infrastructure security.
In that setup, even a second of downtime was unacceptable and justifiably so, due to the nature of the applications that were hosted there and half a million requests per second we were handling at the time. As you might have guessed, a business this lucrative is constantly subject to devastating attacks (DDoS, application level, etc…) from adversaries and random bad actors.
Houston, we have a problem
We had just successfully deployed a new data center to spread the load across regions and after a few days, my manager taps on my shoulder and asks me to go outside of the office building to have a discussion. From the look on his face, I could tell something is not quite right. As I follow right behind him, many different scenarios keep juggling through my mind; have I done something wrong? Am I being fired? Am I getting a promotion :) ? but nothing sticks.
Finally, we make it to a quiet spot and he begins to tell me that in a meeting between the CTO and other team leads, he was confronted by another lead (let’s call him Hans) that our team is not doing a good job since the new data center is not properly hardened and people (read malicious actors) from outside are able to interact with the firewalls through the FTP protocol! Given the importance of the meeting and the attendees, I can only imagine how awkward this must have been for him. Oh, by the way, let’s not bring up the typical power struggles that team leaders encounter.
After hearing this, my jaw dropped since I was personally in charge of that deployment and there was no way that this could have slipped through. Could that be one of my colleagues mistakenly configured the load balancer or a service? Nonetheless, this was some scary news.
Initial Analysis
I rushed back to check the site and as I go through all the configurations and devices we had there, just more and more questions pop into my head. There is nothing wrong! Everything looks ok.
Lastly, I fire up my terminal and type in:
nc -v -w 3 destination_ip 21
and the connection… fails. As expected.
So could that be Hans wanting to stir some drama and mar the feat we had achieved? That can’t be. Why would he lie in a meeting like that and risk his own reputation? Maybe someone in my team was running some tests and Hans happened to port scan that site at the same time? I start checking with my team and unsurprisingly, no one was aware of such thing.
Could that be a bug in this specific OS release? I start plowing through the vendor documents, known issues, release notes… Nope. Nothing is there. I shared my findings with my manager and as the day comes to an end, I pack up to head home.
Next, I see Hans Hans next to the coffee machine and seized the opportunity to strike up a conversation to first make sure nothing has been compromised on their end and second, to ensure we are on the same page and not looking in the wrong direction. He told me that they are not affected by anything yet and indeed we were looking in the right direction!
During my commute, I could not help but to think about the cause of this issue. Once I reached home, I decided to test it again and lo and behold, the connection WORKS! What is going on here?!
I immediately logged into the system and checked the config again, everything is normal as it was 2 hours ago! So, let me get this straight, from outside the organization, an FTP
connection to the firewalls works but not when we are on-site. I contact a few of my colleagues and ask them to replicate the same. Their connections were also successful. Clearly this needs more digging into.
Is there something fishy going on with the routing? Is someone else replying to our FTP requests? Is an upstream IXP or ISP doing some nasty NAT? But how? We own the route and advertise it. If there was a BGP hijack, then that entire data center would be unavailable.
Diving into Packets
Time to trace and capture some packets!
I start sniffing packets on the firewall and at the same time, run tcpdump on my own machine. I ran the same nc
command but with a lower timeout value this time:
nc -v -w 1 destination_ip 21
I observed the packets leaving my machine and successfully establishing a connection:
22:40:14.379637 eth0 Out IP my_ip.42102 > destination_ip.21: Flags [S], seq 2096914158, win 64240, options [mss 1460,sackOK,TS val 3320147191 ecr 0,nop,wscale 7], length 0
22:40:14.496621 eth0 In IP destination_ip.21 > my_ip.42102: Flags [S.], seq 3175437149, ack 2096914159, win 65535, options [mss 1460,sackOK,TS val 1635005169 ecr 3320147191,nop,wscale 9], length 0
22:40:14.496660 eth0 Out IP my_ip.42102 > destination_ip.21: Flags [.], ack 1, win 502, options [nop,nop,TS val 3320147308 ecr 1635005169], length 0
However, on the firewall console, I only saw three SYN
packets! Huh? Hang on, hang on… Let’s recollect our thoughts.
- TCPDUMP on my machine shows a full TCP handshake
- My packets are clearly making their way to the firewall
- I only see three
TCP SYN
packets on the firewall as depicted below and the handshake is not successful:
22:40:14.379637 ethX In IP my_ip.42102 > destination_ip.21: Flags [S], seq 14157, win 1024, options [mss 900,TS val 33201,nop], length 0
22:40:14.379638 ethX In IP my_ip.42102 > destination_ip.21: Flags [S], seq 14158, win 1024, options [mss 900,TS val 33201,nop], length 0
22:40:14.379639 ethX In IP my_ip.42102 > destination_ip.21: Flags [S], seq 14159, win 1024, options [mss 900,TS val 33201,nop], length 0
Why three SYN
packets and not one? Why would nc
send more SYN
packets when it is reporting a connection success? Let’s try NMAP just in case:
nmap -sT -n -Pn -p 21 destination_ip
The exact same thing happens with nmap
. Is the firewall buggy?
Moment of Realization
As I was starting to get excited to finally observing real magic right in front of me for free, I noticed that my packets on the firewall console look like they have been robbed while traveling.
#
# SYN packet leaving my machine
# ------------------------------
22:40:14.379637 eth0 Out IP my_ip.42102 > destination_ip.21: Flags [S], seq 2096914158, win 64240, options [mss 1460,sackOK,TS val 3320147191 ecr 0,nop,wscale 7], length 0
#
# SYN packet received on the firewall
# ------------------------------
22:40:14.379637 Gig48 In IP my_ip.42102 > destination_ip.21: Flags [S], seq 14157, win 1024, options [mss 900,TS val 33201,nop], length 0
What on earth happened to the sequence number
, window size
, and mss
values? They even completely stole the selective ACK (SACK
) bit.
Curiously, I repeat the same procedure using HTTPS
(port 443) and notice the exact same packet leaving my system, reaches the firewall unaltered.
So, only FTP
packets were mangled somewhere in the path and definitely not in our own infrastructure. I walk two of my colleagues through the steps and we come to the same outcome.
Then it clicks! There must be some nasty middleboxes along our paths, proxying the FTP
protocol from residential lines. This explains why the connection from the office was not working since we had multiple direct lines to different IXP
s.
What Is A Middlebox And Why Is It Used?
A middlebox
is a network device that sits between the endpoints of a communication flow, typically between the sender and receiver of packets. It is used to perform various functions such as network address translation (NAT), firewalling, and traffic shaping. These devices can manipulate packets in a variety of ways, such as modifying packet headers, blocking certain types of traffic, or directing traffic to different destinations based on certain rules or policies.
Middleboxes, especially in the context of the internet, are notorious for creating technical challenges; just like the one I faced. In this case, it turned out that the Telco company was using CGNAT.
In short, Carrier-Grade NAT (CGNAT
) is a method used by Internet Service Providers (ISPs) to address the shortage of IPv4 addresses by sharing a small pool of public IP addresses among a large number of customers. CGNAT
is typically implemented using middleboxes
(surprise, surprise!).
Security Challenges of Middleboxes
Sadly and also expectedly, the middleboxes are also weaponized and abused to attack companies. For instance, DDoS amplification attacks; as we saw sending one SYN
packet resulted in the destination receiving three SYN
packets!
If you are interested, you can read more about the security implications these boxes introduce in the links below.
Middleboxes now being used for DDoS attacks in the wild, Akamai finds
Why Only The FTP Traffic was Affected?
Well, it is because FTP is a special protocol! CGNAT
can affect the it in several ways. One of the main issues is that FTP
uses two separate connections for data transfer:
- A control connection for sending commands and receiving responses
- And a separate data connection for transferring the actual files
CGNAT
devices can try to tackle the issues with FTP by using techniques such as Application-level gateway (ALG)
or FTP proxy
, which allow the device to maintain the integrity of the data connection and ensure that the data transfer is done correctly.
In our scenario, we were subject to the FTP proxy
technique which works by intercepting the FTP control connection and handling the data transfer on behalf of the client. The FTP proxy
establishes a new data connection to the server, and then relays the data between the client and server.
Conclusion
Now, do you see why we were seeing three SYN
packets on the firewall console? CGNAT’s FTP proxy
was intercepting our traffic, establishing a successful FTP
connection between us and itself, then trying to establish an actual FTP
connection on our behalf just to fail miserably.
The following day, I demonstrated to the team that we could even FTP to Google successfully and after seeing enough raised eyebrows, decided to spill the beans and pin the culprit.
I normally find myself getting drawn into these “odd” issues to the point that I have developed a taste for them, but this was a particularly alluring problem to come across.
Be sure to read my previous posts and keep an eye out for the upcoming ones if you enjoyed what you read.
TCP Analysis Packet CGNAT FTP FTP Proxy TCP middlebox Network Security Troubleshooting
1788 Words
2023-01-30 12:38