Latest Entries »

After the basics in part I, on to IPv6 and NAT. The title is misleading here: iptables exists for IPv6 and iptables can do NAT, but iptables cannot do NAT for IPv6 connections.

As for IPv6, this part is very simple: just add a ‘6’ between ‘ip’ and ‘tables’…

iptables04

… and it will work for IPv6. As you can see above, since IPv6 addresses are longer, rules tend to split over two rows in a smaller console window.

NAT is an entirely different matter as it involves the translation of an IP address in the IP header of a packet. It doesn’t have much use for a standalone system, but if the Linux is used for routing it’s often needed.

I mentioned before iptables uses chains, but it also uses tables. The table that has been discussed so far is the ‘filter’ table. The ‘nat’ table takes care of NAT. Since it’s an entirely different table, it has its own set of chains:

  • PREROUTING, which applies NAT before the packet is routed or checked by the ‘filter’ table. It is most useful for destination NAT.
  • INPUT is not present in all recent versions anymore and does not serve any real purpose anymore.
  • OUTPUT is for packets originating from the local machine. In general they don’t need NAT s this is rarely used.
  • POSTROUTING applies NAT after the routing of the packet, if it hasn’t been filtered by the ‘filter’ table. It is most useful for source NAT.

A look at the current rule set can be done with iptables -L -v -t nat and rules can be added the same way as in the filter table, except that the parameter -t nat is added every time. The only difference is in the action to take for a rule that matches, the -j parameter.

For the ‘filter’ tables, possible target are ACCEPT and DROP, but for the ‘nat’ table this is different:

  • DNAT specifies destination NAT. It must be followed by –to-destination and the destination. The destination can be an IP address, but if a port was specified in the rule it can also be a socket, e.g. 192.168.0.5:80. This makes port translations possible. A typical use case is if you want to make a server inside your network with a private IP address reachable from the internet.
  • SNAT is source NAT, and typically used for static NAT translations for an inside host with a private IP address towards its public IP address. It must be followed by the –to parameter that defines an IP address. It can also define a pool of IP addresses (e.g. 203.0.113.5-203.0.113.8) and a range of source ports.
  • MASQUERADE is a special case: it is a source NAT behind the outgoing interface’s IP address (hide-NAT). This is ideal for interfaces which use DHCP to receive a public IP address from a provider. No other parameters need to be specified, so it’s not required to change this rule every time the public IP address changes.

These targets by themselves do not block or allow a connection. It’s still required to define the connection in the main ‘filter’ table and allow it.

Examples for NAT:

  • Forward incoming SIP connections (control traffic and voice payload) towards an inside IP phone at 192.168.1.3. Allow the control traffic only from one outside SIP server at 203.0.113.10. The outside interface is eth1.
    iptables -t nat -A PREROUTING -i eth1 -p udp –dport 16384 -j DNAT –to-destination 192.168.1.3
    iptables -t nat -A PREROUTING -i eth1 -p udp –dport 5060 -j DNAT –to-destination 192.168.1.3
    iptables -A FORWARD -d 192.168.1.3 -p udp –dport 16384 -j ACCEPT
    iptables -A FORWARD -s 203.0.113.10 -d 192.168.1.3 -p udp –dport 5060 -j ACCEPT
  • Use IP addres 203.0.113.40 as outgoing IP address for SMTP traffic from server 192.168.2.20
    iptables -t nat -A POSTROUTING -s 192.168.2.20 -p tcp –dport 25 -j SNAT –to 203.0.11.40
    iptables -A FORWARD -s 192.168.2.20 -p tcp –dport 25 -j ACCEPT
  • Use the interface IP address for all other outgoing connections on interface eth1.
    iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE

iptables05

Notice that for rules in the ‘filter’ table that correspond to a NAT rule in the PREROUTING table, the IP addresses are used that are seen after the NAT has taken place, and for the POSTROUTING it’s the original IP addresses that are used. This is because the following order, as mentioned earlier, is very important here.

This is the IPv6 and NAT part of iptables. Up next: optimization and hardening of the rule set.

Advertisement

Most modern Linux distributions come with a firewall package already active. Since it’s often set in an ‘allow-all’ mode, people are often unaware of it.

iptables01

Meet iptables, a basic yet powerful stateful firewall. You can see a default ‘allow-all’ policy above. Note that there are three different chains: INPUT, FORWARD and OUTPUT. Traffic can only match one of these three chains.

  • INPUT is for all traffic that is destined for the local Linux. It’s typically used to filter local services, e.g. you can allow only certain subnets to connect to the Linux via SSH, or shield off a port used by a process that you don’t want to be visible from the internet. This is also for response traffic from connections initiated locally.
  • FORWARD is for all traffic traversing the device. This requires routing functionality to be activated. On Debian-based Linux versions you can do this by adding or modifying the line net.ipv4.ip_forward=1 in /etc/sysctl.conf and perhaps adding some static routes.
  • OUTPUT is all traffic that originates from the local Linux. This is both for outbound connections as for response traffic for local services.

Looking at the current rule set can be done with iptables -L -v, where the optional -v parameter displays extra detail. Adding a rule can be done with iptables -A followed by the chain and the parameters of the rule. The most common ones are:

  • -p defines the protocol: udp, tcp, icmp or an IP Protocol number. You can modify the /etc/protocols file to make Linux recognize an IP Protocol number by name.
  • -i is the incoming interface. This is not supported in the OUTPUT chain for obvious reasons.
  • -o is the outgoing interface. This is not supported in the INPUT chain.
  • -s is the source subnet or host.
  • -d is the destination subnet or host.
  • – -dport (without the space) is the destination port, only valid if the protocol is defined as TCP or UDP. This can be a single port or a range, separated by a double colon.
  • – -sport (without the space) is the source port, or range of ports.
  • -j is the action to take. For the purposes of this article, let’s assume only ACCEPT and DROP are possible. More options will be discussed in upcoming blog posts.

The parameters not defined in a rule are assumed to have the value ‘any’. Examples:

  • Add a rule to allow SSH to the local Linux from one single host 192.168.5.5:
    iptables -A INPUT -p tcp -s 192.168.5.5 –dport 22 -j ACCEPT
  • Allowing subnet 10.0.1.0/24 to do Remote Desktop to server 10.100.0.10:
    iptables -A FORWARD -p tcp -s 10.0.1.0/24 -dport 3389 -j ACCEPT
  • Block any traffic through the device towards UDP ports 10000 to 11000, regardless of source and destination:
    iptables -A FORWARD -p udp –dport 10000:11000 -j DROP
  • Don’t allow any traffic from interface eth4 to interface eth6:
    iptables -A FORWARD -i eth4 -o eth6 -j DROP

There is one additional rule which you will likely need in the configuration, but which differs from the rest of the rules: the stateful traffic rule. Although iptables by default keeps a state tables, it does not use it for traffic matching unless you tell it to. The rules to do this:

iptables -A INPUT -m conntrack –ctstate ESTABLISHED -j ACCEPT
iptables -A FORWARD -m conntrack –ctstate ESTABLISHED -j ACCEPT
iptables -A OUTPUT -m conntrack –ctstate ESTABLISHED -j ACCEPT

This calls in the module for connection tracking (-m conntrack) and matches any connections which have been seen before (ESTABLISHED). It is best to add this rule first this to avoid any drop rules from dropping traffic from known connections.

To modify the default policy, use the -P switch. For example, to block all local incoming connections by default:

iptables -P INPUT DROP

WARNING: When configuring iptables for the first time, especially via SSH, you have to be careful not to lock yourself out of the system. On top of that, some processes use IP communication using the 127.0.0.1 loopback IP address internally. If you tell iptables to block this, it may break some applications!

Review what you want to achieve. For example, say you want to change the default iptables policy to drop any incoming connections, except http traffic and ssh from your computer:

  1. First add the rules allowing internal and stateful traffic:
    iptables -A INPUT -m conntrack –ctstate ESTABLISHED -j ACCEPT
    iptables -A INPUT -i lo -j ACCEPT
  2. Then add the rules allowing the connections:
    iptables -A INPUT -p tcp –dport 80 -j ACCEPT
    iptables -A INPUT -p tcp -s 192.168.1.10 –dport 22 -j ACCEPT
  3. Finally set the policy to deny by default:
    iptables -P INPUT DROP

iptables02

The OUTPUT chain can stay with a default allow action. If you modify it as well, be sure to add the rules for internal and stateful traffic again.

You can check the state table live via cat /proc/net/ip_conntrack

iptables03

Here you see an example of one SSH connection and a NTP connection in the state table.

These are the iptables basics. In upcoming blog posts, I’ll talk about NAT, IPv6, optimization, hardening iptables security and increasing the scalability for large rule sets.

This article further continues on earlier experiments. While using internal tunnels gave a logically ‘simple’ point-to-point network seen from a layer 3 point of view, it came with the drawback of complex header calculations, resulting in CPU hogging on devices capable of hardware switching. Using some route-maps to choose VRFs for flows proved interesting, but only in particular use cases. It didn’t make the troubleshooting any easier.

There is a better method for dealing with inter-VRF routing on a local device, but it requires a very different logic. Up until now, my articles described how to be able to select a next-hop inside another VRF. Route leaking using BGP, however, leverages a features of BGP that has been described in my blog before: the import and export of routing information in different VRFs.

How does it work? Well, since the BGP process can’t differentiate between VRFs based on names, but instead uses route-targets to uniquely identify routing tables, it’s the route target that matters. Usually, for MPLS-VPN, these numbers differ per VRF. This time, use the same number for two VRFs:

VRF-RouteLeaking1This will tell BGP later on to use the same routing table for both VRFs. Now if you configure BGP…

VRF-RouteLeaking2

… There’s still a separate configuration per VRF. How does the exchange happen? Well: for all routes that are known inside the BGP process. This means any routes from any BGP peer in one of the VRFs are automatically learned in both VRFs. But for static and connected routes, and routes from other routing protocols, you can do a controlled redistribution using route-maps or prefix lists. This way only the routes that need to be known in the other VRF are added.

VRF-RouteLeaking3

The routing table with automatically show routes pointing to different VRFs. On top of that, the forwarding does not require any headers to be added or removed to the packets, so it can be put directly into CEF, allowing for hardware forwarding. And for completeness: the BGP neighborships here are just for illustration of what happens, but no neighborship is required for this to function. BGP just needs to run on the device as a process.

Of course, this is just a simple setup. More complexity can always be added, as you can even use route-maps to set the route targets for some routes.

This article is not really written with knowledge usable for a production network in mind. It’s more of an “I have not failed. I’ve just found 10,000 ways that won’t work.” kind of article.

I’m currently in a mailing group with fellow network engineers who are setting up GRE tunnels to each others home networks over the public internet. Over those networks we speak (external) BGP towards each other and each engineer announces his own private address range. With around 10 engineers so far and a partial mesh of tunnels, it gives a useful topology to troubleshoot and experiment with. Just like the real internet, you don’t know what happens day-to-day, neighborships may go down or suddenly new ones are added, and other next-hops may become more interesting for some routes suddenly.

SwitchRouting1

But of course it requires a device at home capable of both GRE and BGP. A Cisco router will do, as will Linux with Quagga and many other industrial routers. But the only device I currently have running 24/7 is my WS-C3560-8PC switch. Although it has an IP Services IOS, is already routing and can do GRE and BGP, it doesn’t do NAT. Easy enough: allow GRE through on the router that does the NAT in the home network. Turns out the old DD-WRT version I have on my current router doesn’t support it. Sure I can replace it but it would cost me a new router and it would not be a challenge.

SwitchRouting2

Solution: give the switch a direct public IP address and do the tunnels from there. After all, the internal IP addresses are encapsulated in GRE for transport so NAT is required for them. Since the switch already has a default route towards the router, set up host routes (a /32) per remote GRE endpoint. However, this still introduces asymmetric routing: the provider subnet is a connected subnet for the switch, so incoming traffic will go through the router and outgoing directly from the switch to the internet without NAT. Of course that will not work.

SwitchRouting3

So yet another problem to work around. This can be solved for a large part using Policy-Based Routing (PBR): on the client VLAN interface, redirect all traffic not meant for a private range towards the router. But again, this has complications: the routing table does not reflect the actual routing being done, more administrative overhead, and all packets originated from the local switch will still follow the default (the 3560 switch does not support PBR for locally generated packets).

Next idea: it would be nice to have an extra device that can do GRE and BGP directly towards the internet and my switch can route private range packets towards it. But the constraint is no new device. So that brings me to VRFs: split the current 3560 switch in two: one routing table for the internal routing (vrf MAIN), one for the GRE tunnels (vrf BGP). However, to connect the two VRFs on the same physical device I would need to loop a cable from one switchport to another, and I only have 8 ports. The rest would work out fine: point private ranges from a VLAN interface in one VRF to a next-hop VLAN interface over that cable in another VRF. That second VRF can have a default route towards the internet and set up GRE tunnels. The two VRFs would share one subnet.

SwitchRouting4

Since I don’t want to deal with that extra cable, would it be possible to route between VRFs internally? I’ve tried similar actions before, but those required a route-map and a physical incoming interface. I might as well use PBR if I go that way. Internal interfaces for routing between VRFs exist on ASR series, but not my simple 8-port 3560. But what if I replace the cable with tunnel interfaces? Is it possible to put both endpoints in different VRFs? Yes, the 15.0(2) IOS supports it!

SwitchRouting5

The tunnel interfaces have two commands that are useful for this:

  • vrf definition : just like on any other layer 3 interface, it specifies the routing table of the packets in the interface (in the tunnel).
  • tunnel vrf :  specifies the underlying VRF from which the packets will be sent, after GRE encapsulation.

With these two commands, it’s possible to have tunnels in one VRF transporting packets for another VRF. The concept is vaguely similar to MPLS-VPN,  where your intermediate (provider) routers only have one routing table which is used to transport packets towards routers that have the VRF-awareness (provider-edge).

interface Vlan2
ip address 192.168.2.1 255.255.255.0
interface Vlan3
ip address 192.168.3.1 255.255.255.0
interface Tunnel75
vrf forwarding MAIN
ip address 192.168.7.5 255.255.255.252
tunnel source Vlan2
tunnel destination 192.168.3.1
interface Tunnel76
vrf forwarding BGP
ip address 192.168.7.6 255.255.255.252
tunnel source Vlan3
tunnel destination 192.168.2.1

So I configure two tunnel interfaces, both in the main routing table. Source and destination are two IP addresses locally configured on the router.  I chose VLAN interface, loopbacks will likely work as well. Inside the tunnels, one is set to the first VRF, the other to the second. One of the VRFs may be shared with the main (outside tunnels) routing table, but it’s not a requirement. Configure both tunnel interfaces as two sides of a point-to-point connection and they come up. Ping works, and even MTU 1500 works over the tunnels, despite the show interface command showing an MTU of only 1476!

Next, I set up BGP to be VRF-aware. Logically, there are two ‘routers’, one of which is the endpoint for the GRE tunnels, and another one which connects to it behind it for internal routing. Normally if it were two physical routers, I would set up internal BGP between them since I’m already using that protocol. But there’s no difference here: you can make the VRFs speak BGP to each other using one single configuration.

router bgp 65000
address-family ipv4 vrf MAIN
neighbor 192.168.7.6 remote-as 65000
network 192.168.0.0 mask 255.255.248.0
neighbor 192.168.7.6 activate
exit-address-family
address-family ipv4 vrf BGP
bgp router-id 192.168.7.6
neighbor 192.168.7.5 remote-as 65000
neighbor 192.168.7.5 activate
exit-address-family

A few points did surface: you need to specify the neighbors (the IP addresses of the local device in the different VRFs) under the correct address families. You also need to specify a route distinguisher under the VRF as it is required for VRF-aware BGP. And maybe the most ironic: you need a bgp router-id set inside the VRF address-family so it differs from the other VRF (the highest interface IP address by default), otherwise the two ‘BGP peers’ will notice the duplicate router-id and it will not work. But after all of that, BGP comes up and routes are exchanged between the two VRFs! For the GRE tunnels towards the internet, the tunnel vrf command is required in the GRE tunnels so they use the correct routing table for routing over the internet.

So what makes this not production-worthy? The software-switching.

The ASIC can only do a set number of actions in a certain sequence without punting towards the switch CPU. Doing a layer 2 CAM table lookup or a layer 3 RIB lookup is one thing. But receiving a packet, have the RIB pointing it to a GRE tunnel, encapsulate, decapsulate and RIB lookup of another VRF is too much. It follows the expected steps in the code accordingly, the IOS software does not ‘see’ what the point is and does not take shortcuts. GRE headers are actually calculated for each packet traversing the ‘internal tunnel’ link. I’ve done a stress test and the CPU would max out at 100% at… 700 kBps, about 5,6 Mbps. So while this is a very interesting configuration and it gives an ideal situation to learn more, it’s just lab stuff.

So that’s the lesson, as stated in the beginning: how not to do it. Can you route between VRFs internally on a Cisco switch or router (not including ASR series)? Yes. Would you want to do it? No!

I know, it’s been quiet on this blog for the past months. But here we are again, starting off with a simple post. Maybe not much real world practical use, but fun to know.

Dealing with ACLs requires more protocol knowledge compared to dealing with a stateful firewall. A stateful firewall takes care of return traffic for you, and often even has some higher layer functionality so it can automatically allow the incoming port of a FTP connection, for example.

ACLs don’t do this. They’re static and don’t care for return traffic. On a switch in particular, they’re done on the ASIC, in hardware. This means there is no logging possible. On the plus side, the filtering doesn’t consume CPU. Many engineers assume stateful firewalls are superior to ACLs, and while this is certainly true concerning scalability and manageability, it’s not 100% true for security. ACLs don’t get fooled by some attacks: they’re not dynamic like the stateful filtering principle, so they will also not work for attacks that try to use the stateful functionality of a firewall against it. Attacks involving many packets attempting to use op CPU also don’t work. True, these attacks are not common, but it’s still a forgotten advantage.

Concerning logging, an ACL does have the option, even on a switch. By adding the ‘log’ parameter to the end of a line you can count the hits on the ACL. However, all this does on the ASIC is punt the packet towards the CPU, who then processes the packet in software and increases a counter and logs a syslog message. If you do this for a lot of packets, or all of them, you’re essentially using the CPU for switching. This defeats the purpose of the ASIC. Most Cisco switch don’t have the CPU for that, limiting throughput and causing high CPU, latency, jitter, even packetloss.

But there’s a more efficient way to do this, for TCP at least: only log the connection initiations.

permit tcp any any established
permit tcp any any eq 80
permit tcp any any eq 443
permit tcp any any eq 22 log
permit udp any any
deny ip any any

In the above ACL the first line will allow all TCP traffic for which there already is a connection established. Just this rule doesn’t allow any traffic: you still need to be able to initiate traffic too. The two rules below it allow HTTP and HTTPS. Finally, the fourth rule allows SSH but also logs it. When a user creates an SSH session through the interface, the first SYN packet will match the log rule and the connection will be logged. All packets of the same flow after this will match the first rule and be switched in hardware. Minimal CPU strain, but connections are logged. Apart from the first packet the flow is forwarded in hardware.

This way of ACL building also has advantages on CPU-based platforms like (low-end) routers: most packets will hit the first line of the ACL and only a few packets will require multiple ACL entries to be checked by the CPU, decreasing general load.

I passed the ARCH exam!

It’s been a while since I’ve posted something here. Multiple reasons of course, but lately I just had to focus on learning so much I didn’t take the time for it anymore. Why? Well since I got my CCNP almost three years ago, it had to be recertified. Together with my CCDA that presented the opportunity to gain a CCDP certification and renewing my CCNP at once by just taking one more exam: ARCH.

So it’s been done. Was it hard? I honestly don’t know. So much has changed this last year for me: how I look at my profession, how I look at learning, at certifications,… I can’t compare it anymore to past experiences. So many things I learned outside of the certification path that are so important to have insights as an engineer… Examples like TCP Windowing, ASIC behavior, VRF deployment, application behavior in LAN and WAN (Citrix, SCP, FTP, NTP, vMotion, SCCM), load balancing and SSL offloading, …

All I know that this was a lot of (useful) theory and I had to devise a plan to learn it all, which eventually succeeded. So besides the certification I improved my ability to learn with it. And that in turn gives me strength for the next one: CCIEv5.

Yes, there I said it. For over a year I kept doubting it a bit, saying I wanted it but not putting a date on it. That’s over now. In a month I’ll start preparing the written with hopefully the exam in the first quarter of 2015.

I am ready.

Disclaimer: the logs are taken from a production network but the values (VLAN ID, names) are randomized.

Recently, I encountered an issue on a Campus LAN while performing routine checks: spanning tree seemed to undergo regular changes.

The LAN in question uses five VLANs and RPVST+, a Cisco-only LAN. At first sight there was no issue:

BPDU-Trace1

On an access switch, one Root port towards the Root bridge, a few Designated ports (note the P2P Edge) where end devices connect, and an Alternative port in blocking with a peer-to-peer neighborship, which means BPDUs are received on this link.

There is a command that allows you to see more detail: ‘show spanning-tree detail’. However, the output from this command is overwhelming so it’s best to apply filters on it. After some experimenting, filtering on the keywords ‘from’,’executing’ and ‘changes’ seems to give the desired output:

BPDU-Trace2

This gives a clear indication of something happening in the LAN: VLAN 302 has had a spanning-tree event less than 2 hours ago. Compared to most other VLANs who did not change for almost a year, this means something changed recently. After checking the interface on which the event happened, I found a port towards a desk which did not have the BPDU Guard enabled, just Root Guard. It was revealed that someone regularly plugged in a switch to have more ports, which talked spanning-tree but with a default priority, not claiming root. As such, Root Guard was not triggered, but the third-party switch did participate in spanning-tree from time to time.

Also, as you notice in the image, VLAN 304 seemed to have had a recent event to, on the Alternative Port. After logging in on the next switch I got the following output:

BPDU-Trace3

Good part: we have a next interface to track. Bad news: it’s a stack port? Since it’s a stack of 3750 series switches, it has stack ports in use, but the switches should act as one logical unit in regards to management and spanning-tree, right? Well, this is true, but each switch still makes the spanning-tree calculation by itself, which means that it can receive a spanning-tree update of another switch in the stack through the stack port.

Okay, but can you still trace this? You would have to be able to look in another switch in the stack… And as it turns out, you can:

BPDU-Trace4

After checking which stack port connects to which switch in the CLI, you can hop to the next one with the ‘session’ command and return to the master switch simply by typing ‘exit’. On the stack member, again doing the ‘show spanning tree detail’ command shows the local port on which the most recent event happened. And again the same thing: no BPDU Guard enabled here.

 

Just a simple article about something I recently did in my home network. I wanted to prepare the network for a Squid proxy, and design it in such a way that the client devices did not require proxy settings. Having trouble placing it inline, I decided I could use WCCP. However, that requires separate VLANs.

This did pose a problem: my home router did not support any kind of routing and multiple networks beyond a simple hide NAT (PAT) behind the public IP address. Even static routes weren’t possible.

And again my fanless 3560-8PC helped me out. The 3560 can do layer 3 so you can configure it with the proper VLANs and use it as the default gateway on all VLANs. Then you add another VLAN towards the router and point a default route towards that router.

That solves half of the problem: packets get to the router and out to the internet. However, the router does not have a return route for the VLANs. But it does not need that: you can use Proxy ARP. As the router will use a /24 subnet, you can subnet all VLANs inside that /24, e.g. a few /26 and a /30 for the VLAN towards the router, as my home network will not grow beyond a dozen devices in total. Now the router will send an ARP request for each inside IP address, after which the layer 3 switch answers on behalf of the client device. The router will forward all data to the layer 3 switch, who knows all devices in the connected subnets.

ProxyARP

And problem solved. From the point of view of the router, there’s one device (MAC address, the layer 3 switch) in the entire subnet that uses a bunch of IP addresses.

And no FabricPath either. This one works without any active protocol involved, and no blocked links. Too good to be true? Of course!

LAN-NoSTP

Take the above example design: three switches connected by port channels. Let’s assume users connect to these switches with desktops.

Using a normal design, spanning tree would be configured (MST, RPVST+, you pick) and one of the three port-channel links would go into blocking. The root switch would be the one connecting to the rest of the network or a WAN uplink, assuming you set bridge priorities right.

Easy enough. And it would work. Any user in a VLAN would be able to reach another user on another switch in the same VLAN. They would always have to pass through the root switch though, either by being connected to it, or because spanning tree blocks the direct link between the non-root switches.

Disabling spanning-tree would make all links active. And a loop would definitely follow. However, wouldn’t it be nice if a switch would not forward a frame received from another switch to other switches? This would require some sort of split horizon, which VMware vSwitches already do: if a frame enters from a pNIC (physical NIC) it will not be sent out another pNIC again, preventing the vSwitch from becoming a transit switch. Turns out this split horizon functionality exists on a Cisco switch: ‘switchport protect’ on the interface, which will prevent any frames from being sent out that came in through another port with the same command.

Configuring it on the port channels on all three switches without disabling spanning tree proves the point: the two non-root switches can’t reach each other anymore because the root switch does not forward frames between the port channels. But disabling spanning tree after it creates a working situation: all switches can reach each other directly! No loops are formed because no switch forwards between the port channels.

Result: a working network with active-active links and optimal bandwidth usage. So what are the caveats? Well…

  • It doesn’t scale: you need a full mesh between switches. Three inter-switch links for three switches, six for four switches, ten for five switches,… After a few extra switches, you’ll run out of ports just for the uplinks.
  • Any link failure breaks the network: if the link between two switches is broken, those two switches will not be able to reach each other anymore. This is why my example uses port-channels: as long as one link is active it will work. But there will not be any failover to a path with better bandwidth.

Again a disclaimer, I don’t recommend it in any production environment. And I’m sure someone will (or already has) ignore(d) this.

DoS attack types.

Everyone has heard of a DoS attack: a Denial of Service attack that consumes a server’s resources, taking it (temporarily) offline. However, more that one type of DoS attack exists. I’m going to discuss a few here to clarify the complexity in defending against them.

The SYN attack

DDoS-Type1

One of the most simple and well-known DoS attacks: just sent TCP packets with only the SYN flag set towards a web server. The server will reserve resources (sockets, memory, CPU) for the incoming connections and reply, but the connection is never completed. This can go up to millions of packets per second.

While this will take down a server eventually, it can also affect the firewall in front of it: the state table will fill up. Worst case this affects reachability of all servers behind the firewall.

One way to deal with this is rate-limiting the number of incoming connections towards each server on the firewall if the firewall supports it, making it one of the few attacks that can be countered without a dedicated anti-DoS appliance.

Another way is using SYN cookies: the firewall will send the reply (SYN,ACK) on behalf of the server and only if the client completes the connection (ACK), the firewall will connect to the server and tie both connections together.

The HTTP GET attack

DDoS-Type2

This attack bypasses the firewall, making it much more difficult to counter. A single TCP connection towards a web server is established. In that connection, instead of requesting the web page once using an HTTP GET, the web page is continuously requested over and over again, using up all server resources.

Countering this requires packet inspection on layer 6-7 which must be done by an IPS or anti-DoS appliance. Firewalls will not detect this.

The UDP flooding attack

DDoS-Type3

A UDP flooding attack is most often done using open DNS resolvers. A DNS request with a spoofed source IP address is sent towards a DNS server. This DNS server replies towards the spoofed IP address (the target server) with a large output. An example is a NS query for the root servers: the response is about four times larger compared to the request. This means about 250 Mbps of request traffic is required to flood a gigabit uplink towards a server in this case.

DDoS-rootNSqueries

Larger multipliers exist for other types of queries, generating 10, 20, 30,… times as much output compared to the query. While this example uses DNS, UDP-based attacks now also exist for NTP and SNMP. Advantages of NTP and SNMP are large possible multiplier values and less awareness of the attack’s existence.

However, the bad part is that this kind of attack floods uplinks towards data centers and LANs, which are shared with other servers or companies. Placing an anti-DoS appliance in the data center right before the firewall, but after the uplink towards the ISP, will be ineffective. Countering this attack requires an appliance at the ISP before the uplink, or DoS cloud services that reroute BGP IP ranges (assuming you have them) for filtering in case of a large-scale DoS.

Without an appliance options are limited. Placing a firewall in front of it will demand a lot of CPU from the firewall as it will still have to receive, check and drop packets. Usually two options remain: switch ACLs and black hole routing. Black hole routing means setting up a Null route of the destination server before it reaches your infrastructure (an ISP or BGP router), essentially giving up your server to save the rest of the network. ACLs on switches are hard to set up in the middle of an attack and not always possible, but the advantage is that packets will be dropped in hardware, using the switch ASIC and not consume any firewall, router or server CPU. Most likely your server will still end up unreachable.

Others
The above are just a few common ones. Fact is, any layer 3 point in the network can be attacked. Even if it doesn’t have any ports listening, it will have to use CPU to look at packets arriving for the IP address it has. And if it’s a web server behind a firewall, it will be reachable on a port which can be exploited. Encryption (SSL) can ironically make this worse because anti-DoS appliances can’t check in the HTTPS session for GET requests, or the encryption itself can be continuously renegotiated, taking up CPU.

Remember, most DoS attacks are legitimate packets. Just a lot of them.