Tag Archive: Wireshark


I recently got confronted with a difficult problem, in which I was unable to find what was the issue. Nevertheless it’s something most network engineers will be confronted with eventually. I can take no credit for the solution though, a colleague solved it (congrats, G).

The problem
The problem is as following:

Throughput

  • Client PC A, behind a WAN link, experiences a slow file transfer rate when uploading to server C. Speed: 1,88 MB per second (MBps).
  • From A to server D, uploading is fast, at 21,43 MBps.
  • Local PC B experiences great uploading speeds towards both servers: 26,08 MBps to C, 108,72 MBps to D.
  • Downloading on PC A from both server C and server D goes reasonable fast as well: 7,06 MBps for both.

At first sight, this is just not logical: no single component can be isolated to be the cause of the problem: the WAN link works fine towards server D and server C works fine when approached locally. On top of that, it’s only uploading to server C that is slow, downloading is faster.

Let’s start simple and look for bottlenecks. Testing shows the WAN line is capable of 200 Mbps throughput. A continuous ping from PC A to both servers averages to 8,30 ms, with little jitter. A similar continuous ping from PC B to the servers gives 0,60 ms. Vice versa, from servers to clients, it’s the same: 0,60 locally, 8,30 over the WAN link. All internal links at both the remote site and the main site use gigabit.

So the biggest bottleneck is the 200 Mbps WAN line. 200 divided by 8 gives 25 MBps. It roughly explains the speed from PC A to server D, but that’s about it. So what is causing it? Taking captures of the transfer (PC A to server C) reveals nothing at first sight:

SMB-1And that for gigabytes of transferred data. Never a lost packet, never an event, just the seemingly endless data-data-ack, data-data-ack,… But, let’s take a closer look at an interesting parameter:

SMB-2TCP Window, negotiated at 17kB. Would this differ for the other transfers? Let’s look at the transfer from PC A to server D:

SMB-33100kB! And when looking at the downloads from the servers on PC A, these both show 64kB. And they both have the same throughput. Now, a pattern is starting to emerge.

The explanation
Indeed, TCP Windowing becomes an important factor over WAN lines. Windowing determines how many packets of data can be sent to the receiver before an acknowledgement is needed. The bigger the window, the more packets can be sent before a return packet is needed. And the sender will not start sending more data before that acknowledgement is received. In a very low latency environment, this is barely noticeable: with 0,60 ms round-trip-time (RTT) and a TCP window of 17 kB, the sender needs to wait 0,60 ms every few packets (two in this case). Over the WAN link, it will need to wait 8,30 ms every two packets for the return packet. At a TCP window of around 3100kB, dozens of packets can be sent before a return packet is needed. Take for example 100 packets for each ACK, that would be 8,30 ms extra time for the single return packet, while an ACK every two packets over the WAN link for 100 packets would mean 415 ms of extra time, waiting for the ACK packets.

This can be put into a formula: TCP Window in bits divided by the round-trip-time in seconds. Source: http://en.wikipedia.org/wiki/TCP_tuning
This gives the following results:

TCP ThroughputThis comes very close to the test results! TCP Window and latency are used to calculate throughput. The Max throughput is the slowest of both Throughput and link speed (you can’t transfer files faster than 200 Mbps over the WAN link). Data throughput is the Max throughput minus 10 percent: this is to take into account the overhead from the headers (frame header, IP header, TCP header). The actual payload throughput is then converted to MBps (divided by eight) to give the final result.

The solution
The solution to this particular situation is checking for the TCP Window size parameters on the server C. With any luck these can be changed. Server C also didn’t rewindow (sending an ACK with a new TCP Window value) during the transfer, so it never became faster. Server D, while starting with a rather low window size, did rewindow, scaling up to 3100kB eventually. Of course, packet loss wasn’t taken into account int this scenario. Should there have been packetloss, server D probably wouldn’t have rewindowed up to 3100kB and might even have had to choose lower values than the one it started with.

TCP Windowing issues can be a lot of things, among which the following:

  • Settings on the receiving host TCP/IP stack, depending on the operating system: both starting window and the ability to rewindow.
  • Host parameters such as CPU and memory usage, disk I/O,… If the receiving host cannot keep up with the rate at which data is received, it will not rewindow to a higher value. After all, the received data has to be stored in RAM, processed, and written to disk. If any of these resources is not available, throughput will be limited.
  • Packet loss on the connection leads to missing data for which no ACK is sent. Throughput will not increase, or even go down as packet loss increases.

This concludes troubleshooting for now. The upcoming series of blog posts will be more theoretical and cover the OSI layer in greater detail. Stay tuned!

Advertisements

WCCP over layer 2.

WCCP?
W
eb Cache Communication Protocol is something that, in the most simple sense, can be referred to as layer 4 routing, just like Policy Based Routing (PBR). I refer to it like that so it’s clear on which layer you’re going to have to think for this article.

PBR has the advantage that you can check incoming traffic on an interface, and depending on the layer 3 and layer 4 source and destination information, you can influence the next hop. WCCP is a specialization and automation of this process: specialization, because it works for proxies (oh, and WAN accelerators) and certain ports only, and automation, because somewhat similar to a routing protocol, the routers and proxies communicate using WCCP.

I assume it’s clear what a proxy is: a server that requests a webpage on behalf of a client computer. The proxy can filter inappropriate content, cache it to speed up other requests to the same website, and some even have an anti-virus scan build in.

While WCCP is developed by Cisco, it’s been adapted by many proxy vendors. I’m going to use the open-source Squid, running on OpenBSD. Since I’m mainly interested in WCCP, I did a basic Squid install, and tweaked the WCCP parameters in the config file (/etc/squid/squid.conf).

Now how does it exactly work? Well, the proxies advertise their proxy capabilities using a WCCP ‘Here I am’ frame. If configured correctly, the routers respond with an ‘I see you’ frame. I’m not making up these names: I’ve uploaded a capture of this on CloudShark. Since it’s possible that the proxy and the router(s) do not share the same subnet, UDP port 2048 is used.

Once a router or multilayer switch and a proxy see each other, the router checks the parameters advertised by the proxy: what can it do? Proxy for http, https, and/or ftp? If it’s considered interesting (matching the desired features), the router starts forwarding traffic for those specific services (ports) towards the proxy. Because the router starts forwarding based on layer 4 information, the clients are unaware of this and don’t need any proxy configured in the browser. It can do forwarding in one of two ways: a GRE tunnel or directly on layer 2. Layer 2 requires the proxy and the router to share a subnet or VLAN, and this method is widely supported by layer 3 switches. The GRE tunnel method is usually supported by routers.

Topology
The topology uses three VLANs: one for the clients, one for the proxy, and one towards the gateway.
WCCPBecause I’m using a multilayer switch as WCCP Router, which only supports layer 2 forwarding of WCCP, the proxy has to be in a different subnet, as the switch somehow refuses to do a MAC address rewrite of the frame on the same interface. The proxy has to have internet access too of course, as it will do the connections to the web servers on behalf of the clients. The connection to the gateway is a third VLAN, or a layer 3 interface on the switch towards the gateway (remember ‘no switchport’?).

Configuration
As said, I’m going to focus mostly on the WCCP Router here. I’m going to use the following parameters: the standard service ‘web-cache’ (which are basic proxy capabilities for http, more advanced configuration require a custom service group with parameters which will be included in the WCCP frames), layer 2 forwarding, and unicast WCCP frames. In the Squid.conf file these are all configurable options, with extra information present in the file itself.
Assuming 192.168.168.0/24 for the clients, and 192.168.163.0/24 for the proxy, with the Squid at .5, the configuration is as following:

WS-C3560-8PC(config)#interface Vlan163
WS-C3560-8PC(config-if)#ip address 192.168.163.1 255.255.255.0
WS-C3560-8PC(config-if)#exit
WS-C3560-8PC(config)#ip access-list standard ACL-WCCP
WS-C3560-8PC(config-std-nacl)#10 permit 192.168.163.5
WS-C3560-8PC(config-std-nacl)#exit
WS-C3560-8PC(config)#ip access-list standard ACL-PROXY
WS-C3560-8PC(config-std-nacl)#10 permit 192.168.168.0 0.0.0.255
WS-C3560-8PC(config-std-nacl)#exit
WS-C3560-8PC(config)#ip wccp web-cache
WS-C3560-8PC(config)#ip wccp web-cache redirect-list ACL-PROXY group-list ACL-WCCP
WS-C3560-8PC(config)#interface Vlan168
WS-C3560-8PC(config-if)#ip address 192.168.168.1 255.255.255.0
WS-C3560-8PC(config-if)#ip wccp web-cache redirect in
WS-C3560-8PC(config-if)#exit

The ACL-WCCP defines the WCCP clients which may be used, and the ACL-PROXY defines the clients that can use the redirect service (you can exclude certain clients this way). Note that both are standard ACLs, using an extended ACL didn’t work.
The discovery of an interesting proxy comes with a nice syslog:

%WCCP-5-SERVICEFOUND: Service web-cache acquired on WCCP Client 192.168.163.5

After that the switch starts sending the http frames towards the proxy, who does the rest.

I have to admit, I had a great deal of help from the people of Networking-forum.com, and in particular Steven King who has explained WCCP in great detail.

Traffic captures: tips and tricks.

I have been planning a completely different blog post for over a week now, but I’m currently not advancing in my research. Instead, due to the many experiments, I’ve become better at capturing traffic.

Tools of the trade: Wireshark (Windows) and tcpdump (Unix). While tcpdump works on the command line and is very lightweight, Wireshark comes with a lot more options and a GUI. The two are compatible: capture files saved with tcpdump can be opened in Wireshark.

TCPdump parameters
This very simply tool is usually included by default on a Unix platform (Linux, OpenBSD, vendor systems running on a Linux kernel, …). It can have many parameters, of which I’ll list the most useful here:

  • -n – Makes sure no name resolving of addresses, and conversing of port names (e.g. 80 to www) is done. Since you’re probably troubleshooting on layer 2 and 3, it’s easier to see the actual numbers.
  • -i int – Specifies an interface to capture on. If you don’t specify it in a recent Linux kernel, it will take the first non-loopback it finds. However, when networking, devices you encounter often have multiple interfaces.
  • -s length – Sets the length that packets will be captured. By default only the first bytes are captured. Setting this value to 1514 will allow you to capture all packets on an Ethernet interface completely, which is handy for the -w option below.
  • -w file – Writes the capture to a file instead of showing it on-screen. If you specify it as a *.pcap file, it can be opened in Wireshark later on!

As an example of the above, ‘tcpdump -n -w /var/log/ethertest.pcap -i eth0’ will do a packet capture on interface eth0 and write the information to the ethertest.pcap file, without doing any name resolution. To stop a capture, press ‘Ctrl+C’. A more complete list of parameters can be checked here.

TCPdump filters
Apart from the parameters, filters are possible. Below a few handy ones:

  • host ip – Only capture packets that originate from or are destined to a certain IP address. The most common mistakes you can make with this filter are forgetting there’s NAT somewhere involved, so you don’t see anything because you’re filtering on an IP address that’s no longer present in the packets, or something is using IPv6 though dual stack and while you’re capturing on the right interface, nothing shows because you’re filtering on IPv4.
  • net prefix – Only capture traffic from or to a certain subnet, e.g. 192.168.4.0/23.
  • port number – Capture traffic with a certain port number, both UDP and TCP. Usually this is clear enough as there’s rarely both a UDP and a TCP stream on an interface with the same port number. Also counts both for destination and source port number.
  • icmp – ICMP traffic only. Great to see if your pings go through somewhere.
  • vlan number – Only frame that have an 802.1q header with the matching VLAN number will be captured. This option is very important on trunk links in combination with ‘-w’, as I’ve noticed tcpdump doesn’t always write tagged frames correctly to a file unless this filter is applied.
  • ‘not’, ‘and’, and other booleans – These allow you to negate things and make combinations.

Some examples explain these filters:
‘tcpdump -n -i eth0 not port 22’: capture all traffic except port 22 (useful when you’re connected through SSH on the same interface).
‘tcpdump -n -i eth0 host 10.0.5.3 and host 10.2.3.14’: capture all traffic between those two IP hosts.

Wireshark
Wireshark has some extra functionality compared to tcpdump, but tcpdump filters are present as well. Under ‘Capture’, ‘Options…’ you can define a capture filter, which uses the exact same parameters as tcpdump. Using ‘not port 3389’ here for example is useful if you’re trying a capture on a remote computer. You can also use a filter on all captured frames to show only those interesting to you. Difference with the pre-capture filter is that this filters out what you see, but not what is captured, which is useful when taking a raw capture to examine later. Some important ones:

  • icmp: Just ICMP traffic.
  • udp.port and tcp.port: TCP or UDP port number. Unlike tcpdump’s ‘port’ you can differentiate between UDP and TCP here.
  • ip.dst and ip.src: Source and destination IP addresses.
  • eth.dst and eth.src: Source and destination MAC addresses.
  • tcp.stream: Filter out one single TCP stream. Useful to follow a connection in a sea of frames.

Booleans can be used just like with tcpdump, and to define a variable, use ‘==’, e.g. ‘ip.dst == 10.0.0.1 and not icmp’  will show all traffic towards 10.0.0.1, except ICMP packets. Next to ‘==’ (equal to), variables can also be ‘>’ (greater than), ‘>=’ (greater or equal than), ‘!=’ (not), ‘<‘ (smaller than) and ‘<=’ (smaller or equal than).

Wireshark’s extra functionality is due to the graphical element: under ‘Statistics’ you can use ‘IO Graphs’ to show bandwidth usage during the capture. This helps visualize the traffic patterns: are there sudden bursts of traffic, or just a steady flow? Here too you can filter.
Under the same ‘Statistics’ there’s also ‘Conversations’, which makes a list of all captured traffic flows. You can sort this list to show the connections that use the most bandwidth. Very useful to find what’s causing unexpected bandwidth usage.

SPAN ports
Optimizing the filters to capture data is one thing, but optimizing the replicated data to capture on a SPAN interface can be beneficial too. A basic SPAN session on the same switch is set up as following:

Router(config)#monitor session 1 source interface G1/0/1
Router(config)#monitor session 1 destination interface G1/0/5

There are however a few tweaks possible. First, capturing on a trunk link is possible, and it’s possible to filter out only the required VLANs using the ‘monitor session 1 filter vlan vlan-list‘. A good use of this is when trying to capture traffic on a few VLANs on a gigabit trunk link, while the SPAN port is a 100 Mbps port. By filtering only the needed VLANs less traffic will be replicated and the 100 Mbps link will not saturate as quickly.

Second, while a SPAN port replicates most traffic, it does not replicate switch control frames like BPDUs and CDP frames. You can force the replication of these frames using the ‘monitor session 1 destination interface G1/0/5 encapsulation replicate’ command. A good use for this is checking why an IP Phone will not come online, for example.

The above are all just small tips and tricks, but together they make troubleshooting something a lot clearer.

VRRP between Cisco and Vyatta.

I already mentioned in an earlier post that I was doing some experiments with Virtual Router Redundancy Protocol on routers from different vendors. For those of you not familiar with VRRP: it’s a protocol that allows multiple routers to share the same IP address, which then can be used as the default gateway for end devices. This gives you some redundancy in case a router goes down. VRRP is the IETF standard for an earlier protocol: HRSP, which is available on Cisco devices only. For more info, Wikipedia is your friend.

I already managed to get VRRP running on both GNS3 and real Cisco routers, but since it’s supposed to be a standard, why not try it in a multivendor environment? My favorite non-Cisco router is Vyatta: it’s a stripped-down Linux with nothing left but the kernel and network-related packages already installed. The command line handles somewhat like a Cisco IOS. Since it can run on almost any x86 hardware, you can virtualize it too, so it’s an easy solution in my lab. The basic version is free.

I followed the guide I found on openmaniak.com and got it running in no time. I used the following configuration: 192.168.0.2 for the Vyatta, 192.168.0.3 for the Cisco router (a 2611 running 12.3 IOS), and 192.168.0.5 as the virtual IP address.

I started and configured my Vyatta first. Here you see it sending VRRP multicasts to 224.0.0.18, to announce to the other router(s) that he’s currently the master and will handle all packets send to 192.168.0.5, the virtual address.

VRRP Vyatta

Next, I booted and configured the Cisco router. Note that both configurations used the ‘preempt’ command, which means that if a ‘better’ router is present in the subnet, it will immediately assume the master role, instead of waiting until the current leader (the Vyatta) goes down. The ‘better’ router here means the one with the higher priority, or in case of a tie, the one with the highest IP address.

Since the Cisco router has a higher IP address, it takes the master role after a few seconds:

After the Cisco router becomes the master, it will handle any packet destined for 192.168.0.5. Should the router fail, for example by me unplugging the ethernet cable, the Vyatta router will take the master role again and the address 192.168.0.5 will stay reachable.

So far so good. But I did come across a small problem in my tests. if you watch both images closely, you’ll notice that the Cisco router is using a source MAC address of 00:00:5e:00:01:01. This is correct, because this is the MAC address that must be used according to the RFC. (The last ’01’ is the VRRP group 1. Had I configured it with VRRP group 5, it would be ’05’.)

The Vyatta router does not use this MAC address but instead uses his own (00:0c:29:fd:d5:23, VMWare virtual NIC). I’ve done some research around the web and could not find anything conclusive, but I’ve heard of Linux versions having trouble using multiple source MAC addresses, so this may be the cause. It does create a problem though, because if a router fails in this configuration, the end devices are left with the wrong ARP information, making the 192.168.0.5 address unreachable after all. It’s possible to solve this issue by sending a Gratuitous ARP packet in case of a failure, but I didn’t notice such a packet in my tests, and it would still make things more complex than they are supposed to be. At this moment I am uncertain if VRRP works well in Vyatta. But that aside, I did learn a lot today.