Categories

Category cover

Automation
43 posts

Category cover

Security
20 posts

Category cover

Notes
19 posts

Category cover

Personal Security
15 posts

Category cover

CISO
14 posts

Category cover

Infrastructure
12 posts

Category cover

Learning paths
9 posts

Category cover

OT/ICS
6 posts

Category cover

UNetLab
4 posts

Category cover

Books
3 posts

Category cover

Write-up
3 posts

Category cover

OSInt
2 posts

Category cover

My life
1 posts

Analyzing a network performance issue

Andrea Dainese
February 05, 2016
Post cover

In this case, the application team is reporting a random high RTT between a client and a server. Both client and server are Linux virtual machines, under VMware vSphere, and the path is through a lot of firewalls, routers, switches, load balancers… and so on.

The issue

The very first thing is to replicate the issue from the client:

PORT=6000; while true; do echo -ne "${PORT} "; nmap -n -g ${PORT} -p 143 10.0.1.1 | grep scanned; PORT=$((${PORT} + 1)); done

After a while I could see some requests going over one second:

6000 Nmap finished: 1 IP address (1 host up) scanned in 0.119 seconds
6001 Nmap finished: 1 IP address (1 host up) scanned in 0.119 seconds
6002 Nmap finished: 1 IP address (1 host up) scanned in 0.119 seconds
6003 Nmap finished: 1 IP address (1 host up) scanned in 0.116 seconds
6004 Nmap finished: 1 IP address (1 host up) scanned in 0.118 seconds
6005 Nmap finished: 1 IP address (1 host up) scanned in 3.861 seconds
6006 Nmap finished: 1 IP address (1 host up) scanned in 0.116 seconds
6007 Nmap finished: 1 IP address (1 host up) scanned in 0.119 seconds
6008 Nmap finished: 1 IP address (1 host up) scanned in 0.128 seconds
6009 Nmap finished: 1 IP address (1 host up) scanned in 0.113 seconds

The issue is now confirmed. Before continuing, be sure the client and server have both clocks synchronized.

client# clock
Fri 05 Feb 2016 12:28:57 PM CET  -0.016003 seconds
server# clock
Fri 05 Feb 2016 12:26:57 PM CET  -0.347022 seconds

Supposing the network is stale and deterministic, the next step will require checking the end-to-end path on both client and server:

client# tcpdump -i eth0 -s 0 -nn -w /tmp/dump.pcap host 10.0.2.2 and portrange 6000-6999 and host 10.0.1.1 and port 143
server# tcpdump -i eth0 -s 0 -nn -w /tmp/dump.pcap host 10.0.2.2 and portrange 6000-6999 and host 10.0.1.1 and port 143

The source port is set to a fixed number so the tcpdump filter can be configured to match test requests only.

After a couple of requests with high latency, both tcpdump can be stopped and analyzed.

Let’s review some high latency requests:

6007 Nmap finished: 1 IP address (1 host up) scanned in 3.285 seconds
6089 Nmap finished: 1 IP address (1 host up) scanned in 2.911 seconds
6200 Nmap finished: 1 IP address (1 host up) scanned in 4.025 seconds

Let’s dump the request with source port 6088 on the client:

client# tcpdump -r /tmp/dump.pcap -nn "port 6200 and tcp[tcpflags] & (tcp-syn) != 0"
reading from file /tmp/dump.pcap, link-type EN10MB (Ethernet)
12:35:02.639543 IP 10.0.2.2.6200 > 10.0.1.1.143: S 2917316624:2917316624(0) win 2048 <mss 1460>
12:35:02.640273 IP 10.0.1.1.143 > 10.0.2.2.6200: S 2145376395:2145376395(0) ack 2917316625 win 5840 <mss 1460>
server# tcpdump -r /tmp/dump.pcap -nn "port 6200 and tcp[tcpflags] & (tcp-syn) != 0"
reading from file /tmp/dump.pcap, link-type EN10MB (Ethernet)
12:35:02.635480 IP 10.0.2.2.6200 > 10.0.1.1.143: S 2917316624:2917316624(0) win 2048 <mss 1460>
12:35:02.635488 IP 10.0.1.1.143 > 10.0.2.2.6200: S 2145376395:2145376395(0) ack 2917316625 win 5840 <mss 1460>

So, we have the following timeline:

  • 12:34:29.573249 client sends the SYN packet
  • 12:34:29.569671 server receives the SYN packet
  • 12:34:29.569686 server send back the SYN+ACK packet
  • 12:34:29.574055 client receive the SYN+ACK packet

Besides a slight clock misalignment, all packets were sent and received back in less than 3 milliseconds. So the network is not the cause of the latency. The cause is the source host, spending time managing processes.