Category: Other


This article is not really written with knowledge usable for a production network in mind. It’s more of an “I have not failed. I’ve just found 10,000 ways that won’t work.” kind of article.

I’m currently in a mailing group with fellow network engineers who are setting up GRE tunnels to each others home networks over the public internet. Over those networks we speak (external) BGP towards each other and each engineer announces his own private address range. With around 10 engineers so far and a partial mesh of tunnels, it gives a useful topology to troubleshoot and experiment with. Just like the real internet, you don’t know what happens day-to-day, neighborships may go down or suddenly new ones are added, and other next-hops may become more interesting for some routes suddenly.

SwitchRouting1

But of course it requires a device at home capable of both GRE and BGP. A Cisco router will do, as will Linux with Quagga and many other industrial routers. But the only device I currently have running 24/7 is my WS-C3560-8PC switch. Although it has an IP Services IOS, is already routing and can do GRE and BGP, it doesn’t do NAT. Easy enough: allow GRE through on the router that does the NAT in the home network. Turns out the old DD-WRT version I have on my current router doesn’t support it. Sure I can replace it but it would cost me a new router and it would not be a challenge.

SwitchRouting2

Solution: give the switch a direct public IP address and do the tunnels from there. After all, the internal IP addresses are encapsulated in GRE for transport so NAT is required for them. Since the switch already has a default route towards the router, set up host routes (a /32) per remote GRE endpoint. However, this still introduces asymmetric routing: the provider subnet is a connected subnet for the switch, so incoming traffic will go through the router and outgoing directly from the switch to the internet without NAT. Of course that will not work.

SwitchRouting3

So yet another problem to work around. This can be solved for a large part using Policy-Based Routing (PBR): on the client VLAN interface, redirect all traffic not meant for a private range towards the router. But again, this has complications: the routing table does not reflect the actual routing being done, more administrative overhead, and all packets originated from the local switch will still follow the default (the 3560 switch does not support PBR for locally generated packets).

Next idea: it would be nice to have an extra device that can do GRE and BGP directly towards the internet and my switch can route private range packets towards it. But the constraint is no new device. So that brings me to VRFs: split the current 3560 switch in two: one routing table for the internal routing (vrf MAIN), one for the GRE tunnels (vrf BGP). However, to connect the two VRFs on the same physical device I would need to loop a cable from one switchport to another, and I only have 8 ports. The rest would work out fine: point private ranges from a VLAN interface in one VRF to a next-hop VLAN interface over that cable in another VRF. That second VRF can have a default route towards the internet and set up GRE tunnels. The two VRFs would share one subnet.

SwitchRouting4

Since I don’t want to deal with that extra cable, would it be possible to route between VRFs internally? I’ve tried similar actions before, but those required a route-map and a physical incoming interface. I might as well use PBR if I go that way. Internal interfaces for routing between VRFs exist on ASR series, but not my simple 8-port 3560. But what if I replace the cable with tunnel interfaces? Is it possible to put both endpoints in different VRFs? Yes, the 15.0(2) IOS supports it!

SwitchRouting5

The tunnel interfaces have two commands that are useful for this:

  • vrf definition : just like on any other layer 3 interface, it specifies the routing table of the packets in the interface (in the tunnel).
  • tunnel vrf :  specifies the underlying VRF from which the packets will be sent, after GRE encapsulation.

With these two commands, it’s possible to have tunnels in one VRF transporting packets for another VRF. The concept is vaguely similar to MPLS-VPN,  where your intermediate (provider) routers only have one routing table which is used to transport packets towards routers that have the VRF-awareness (provider-edge).

interface Vlan2
ip address 192.168.2.1 255.255.255.0
interface Vlan3
ip address 192.168.3.1 255.255.255.0
interface Tunnel75
vrf forwarding MAIN
ip address 192.168.7.5 255.255.255.252
tunnel source Vlan2
tunnel destination 192.168.3.1
interface Tunnel76
vrf forwarding BGP
ip address 192.168.7.6 255.255.255.252
tunnel source Vlan3
tunnel destination 192.168.2.1

So I configure two tunnel interfaces, both in the main routing table. Source and destination are two IP addresses locally configured on the router.  I chose VLAN interface, loopbacks will likely work as well. Inside the tunnels, one is set to the first VRF, the other to the second. One of the VRFs may be shared with the main (outside tunnels) routing table, but it’s not a requirement. Configure both tunnel interfaces as two sides of a point-to-point connection and they come up. Ping works, and even MTU 1500 works over the tunnels, despite the show interface command showing an MTU of only 1476!

Next, I set up BGP to be VRF-aware. Logically, there are two ‘routers’, one of which is the endpoint for the GRE tunnels, and another one which connects to it behind it for internal routing. Normally if it were two physical routers, I would set up internal BGP between them since I’m already using that protocol. But there’s no difference here: you can make the VRFs speak BGP to each other using one single configuration.

router bgp 65000
address-family ipv4 vrf MAIN
neighbor 192.168.7.6 remote-as 65000
network 192.168.0.0 mask 255.255.248.0
neighbor 192.168.7.6 activate
exit-address-family
address-family ipv4 vrf BGP
bgp router-id 192.168.7.6
neighbor 192.168.7.5 remote-as 65000
neighbor 192.168.7.5 activate
exit-address-family

A few points did surface: you need to specify the neighbors (the IP addresses of the local device in the different VRFs) under the correct address families. You also need to specify a route distinguisher under the VRF as it is required for VRF-aware BGP. And maybe the most ironic: you need a bgp router-id set inside the VRF address-family so it differs from the other VRF (the highest interface IP address by default), otherwise the two ‘BGP peers’ will notice the duplicate router-id and it will not work. But after all of that, BGP comes up and routes are exchanged between the two VRFs! For the GRE tunnels towards the internet, the tunnel vrf command is required in the GRE tunnels so they use the correct routing table for routing over the internet.

So what makes this not production-worthy? The software-switching.

The ASIC can only do a set number of actions in a certain sequence without punting towards the switch CPU. Doing a layer 2 CAM table lookup or a layer 3 RIB lookup is one thing. But receiving a packet, have the RIB pointing it to a GRE tunnel, encapsulate, decapsulate and RIB lookup of another VRF is too much. It follows the expected steps in the code accordingly, the IOS software does not ‘see’ what the point is and does not take shortcuts. GRE headers are actually calculated for each packet traversing the ‘internal tunnel’ link. I’ve done a stress test and the CPU would max out at 100% at… 700 kBps, about 5,6 Mbps. So while this is a very interesting configuration and it gives an ideal situation to learn more, it’s just lab stuff.

So that’s the lesson, as stated in the beginning: how not to do it. Can you route between VRFs internally on a Cisco switch or router (not including ASR series)? Yes. Would you want to do it? No!

And no FabricPath either. This one works without any active protocol involved, and no blocked links. Too good to be true? Of course!

LAN-NoSTP

Take the above example design: three switches connected by port channels. Let’s assume users connect to these switches with desktops.

Using a normal design, spanning tree would be configured (MST, RPVST+, you pick) and one of the three port-channel links would go into blocking. The root switch would be the one connecting to the rest of the network or a WAN uplink, assuming you set bridge priorities right.

Easy enough. And it would work. Any user in a VLAN would be able to reach another user on another switch in the same VLAN. They would always have to pass through the root switch though, either by being connected to it, or because spanning tree blocks the direct link between the non-root switches.

Disabling spanning-tree would make all links active. And a loop would definitely follow. However, wouldn’t it be nice if a switch would not forward a frame received from another switch to other switches? This would require some sort of split horizon, which VMware vSwitches already do: if a frame enters from a pNIC (physical NIC) it will not be sent out another pNIC again, preventing the vSwitch from becoming a transit switch. Turns out this split horizon functionality exists on a Cisco switch: ‘switchport protect’ on the interface, which will prevent any frames from being sent out that came in through another port with the same command.

Configuring it on the port channels on all three switches without disabling spanning tree proves the point: the two non-root switches can’t reach each other anymore because the root switch does not forward frames between the port channels. But disabling spanning tree after it creates a working situation: all switches can reach each other directly! No loops are formed because no switch forwards between the port channels.

Result: a working network with active-active links and optimal bandwidth usage. So what are the caveats? Well…

  • It doesn’t scale: you need a full mesh between switches. Three inter-switch links for three switches, six for four switches, ten for five switches,… After a few extra switches, you’ll run out of ports just for the uplinks.
  • Any link failure breaks the network: if the link between two switches is broken, those two switches will not be able to reach each other anymore. This is why my example uses port-channels: as long as one link is active it will work. But there will not be any failover to a path with better bandwidth.

Again a disclaimer, I don’t recommend it in any production environment. And I’m sure someone will (or already has) ignore(d) this.

Static host entries for troubleshooting.

Something a bit more simple, yet sometimes very effective: using static host entries. Once again, I have unstable internet at home and it disconnects from time to time. While the other users in my house have some basic knowledge about networking, it’s not their thing, so it’s limited to ping and ipconfig.

The topology in my house does not make it easier for them: a computer connected to a managed switch, which then connects to an ISP modem with build-in router functionality (NAT, DHCP). It’s not always clear what is the cause: sometimes the internet goes down, but the modem/router has also crashed at times, and at one time the cable towards the switch failed and had to be replaced. While I can quickly see where the problem is depending on what I can ping, other users don’t remember all those IP addresses.

Until I got the idea to create static host entries. These are like DNS entries, but configured on a local computer so it’s not dependent on the network to use them, and simple entries can be made for devices that do not have DNS entries.

In Windows, these entries can be made in the hosts file, which you can find in C:\Windows\System32\drivers\etc as a file without extension. It can be opened in notepad and per line an entry can be added, e.g. ‘192.168.1.1 modem’ and ‘192.168.1.2 switch’. In Linux, this is the /etc/hosts file, and in MAC OSX, it’s /private/etc/hosts . After modifying it, troubleshooting becomes much more clear:

PingSwitch

The switch can still be reached, and a ping to ‘modem’ shows if the modem is still alive. Simple and effective.

Of course this is mainly useful at home, although this can be interesting for a remote office as well. It makes troubleshooting with end users easier, as no IP addresses have to be dictated through the phone or looked up in documentation.

If you’re working in a large enterprise with its own AS and public range(s), you’ll probably recognize the following image:

MultiClientDataCenter

On top, the internet with BGP routers, peering with multiple upstream providers and advertising the public range(s), owned by the company. Below that, multiple firewalls or routers (but I do hope firewalls). Those devices either provide internet access to different parts of the company network, or provide internet access to different customers. Whatever the case, they will have a default route pointing towards the BGP routers (a nice place for HSRP/VRRP).

Vice versa, the BGP routers have connectivity to the firewalls in the connected subnet, but they must also have a route for each NAT range towards the firewalls. This is often done using static routes: for each NAT range, a static route is defined on the BGP routers, with a firewall as a next hop. On that firewall, those NAT addresses are used, e.g. if the BGP has a route for 192.0.2.0/30, those four (yes, including broadcast and network) addresses can be used to NAT a server or users behind, even if those aren’t present on any interface in that firewall.

The problem in this setup is that it quickly becomes a great administrative burden, and since all the firewalls have a default route pointing towards the BGP routers, traffic between firewalls travels an extra hop. Setting up static routing for all the NAT ranges on each firewall is even more of a burden, and forgotten routes introduce asymmetric routing, where the return path is through the BGP. And while allocating one large public NAT range to each firewall sounds great in theory, reality is that networks tend to grow beyond their designed capacity and new NAT ranges must be allocated from time to time, as more servers are deployed. Perhaps customers even pay for each public IP they have. Can this be solved with dynamic routing? After all, the NAT ranges aren’t present on any connected subnet.

Yes, it’s possible! First, set up a dynamic routing protocol with all routers in the connected public subnet. Personally, I still prefer OSPF. In this design, the benefit of OSPF is the DR and BDR concept: set these to the BGP routers so they efficiently multicast routing updates to all firewalls. Next, on all firewalls, allow the redistribution of static routes, preferably with an IP prefix filter that allows only subnets in your own public IP range, used for NAT. Finally, if you need a NAT range, you just create a Null route on the firewall that needs it (e.g. 192.0.2.0/30 to Null0), and the route will automatically be redistributed into OSPF towards the BGP routers, who will send updates to all other firewalls. Problem solved!

But what about that Null route? Won’t packets be discarded? No, because this is where a particular logic of NAT comes into play: NAT translations are always done before the next hop is calculated. A packet entering with destination address pointing towards a Null route (e.g. 192.0.2.1) will first be translate to the private IP address, after which the route lookup gives a connected subnet or another route further into the private network. The Null route is never actually used!

GuessWhat

There where some guesses for my reader challenge, but Pape was correct: it’s a Cisco 18400 IRIS router. IRIS stands for Internet Routing In Space, and this router is a module for a satellite. A quick overview:

  • It runs IOS, and can be remotely upgraded. Considering the cost to get someone physically near the router, that’s a huge advantage.
  • It has extended VOIP Services like Cisco Unified Communications Manager Express, improving the latency for voice calls.
  • It can exchange routing updates with other satellites with this module, and even routers on the planet surface. This way, direct connectivity between satellites becomes much easier.
  • A full feature set: QoS, security, IPv6, OSPF, BGP, SNMP,…
  • High radiation tolerance, which is mandatory in space.
  • Gigabit-capable interfaces, although these interface go directly into satellite circuitry or antennae. Effective throughput is around 250 Mbps according to the data sheet.

This is most likely nothing I will ever have the chance to work with in this lifetime, nor something I will find on eBay. An impressive piece of technology!

Something completely different for a change.

GuessWhat

What is this? I’ll give some hints: it’s made by Cisco, runs IOS code, and it’s not End of Life (EOL).

If you know it, post it in the comments!

Time to describe the solution to my last blog post.

Let’s see the topology again:

The router does not have any routing protocol configured. It does have all interfaces configured: 10.0.0.1 towards the WAN cloud, 10.0.1.1 towards the PC 2 subnet, and an IP address towards the PC 1 subnet. PC 2 is a Windows computer, static IP 10.0.1.2. PC 1 is a Windows computer set to DHCP client, but there is no DHCP pool configured on the router or any other device. PC 1 hasn’t had any special changes to the software settings or TCP/IP stack.

Under what circumstances will PC 1 and PC 2 be able to ping each other?

Well, note that I speak of Windows computers. This OS does a very specific thing when there’s no DHCP server: it will use an APIPA address. This is an address in the range 169.254.0.0/16. The remaining 16 hostbits are generated from the MAC address. So despite that I not explicitly give the PC1 IP address, you know in which subnet the IP will be. So the router interface towards PC1 needs an IP in the same subnet. Congrats to Pape Diack for working that out!

That’s half of the challenge. But to ping PC1 from another subnet, it’ll need a default route, which it does not have. Unless it somehow assumes the echo request originated from the local subnet.

First solution: a (static) NAT rule on the router, which translates an IP on another subnet to an IP on the local subnet. For PC1, it seems as if the router sends the echo request. Once the router receives the echo reply, it is translated and sent to PC2.

Second solution: proxy-arp. Enabled by default on a Cisco router. If PC1 doesn’t have a gateway, it desperately tries to ARP for PC2’s MAC address. If the router knows how to reach PC2 (possible through the connected subnet), it will reply with his own MAC, and forward the packet once it arrives.

A third, more out-of-the-box thinking solution was using IPv6, as suggested by some. Granted it would work, but not exactly what I was looking for. I better add more detail to my questions the next time.

Thanks for all the answers!

Something else for a change: a challenge for the reader.

Given the following topology:

The router does not have any routing protocol configured. It does have all interfaces configured: 10.0.0.1 towards the WAN cloud, 10.0.1.1 towards the PC 2 subnet, and an IP address towards the PC 1 subnet. PC 2 is a Windows computer, static IP 10.0.1.2.
PC 1 is a Windows computer set to DHCP client, but there is no DHCP pool configured on the router or any other device. PC 1 hasn’t had any special changes to the software settings or TCP/IP stack.

Now the question: under what circumstances will PC 1 and PC 2 be able to ping each other?

To my knowledge, this question has two possible solutions. You have an idea? Please share it in the comments!

Update: the solution.

My new equipment arrived before the weekend, but I didn’t notice until Christmas because my girlfriend wrapped it up again and put it as a present under the tree. She’s devious at that.

So what is it? Something I wanted for a long time: an access server. It consists of a 2611XM router, a NM-16A module, two octal cables and a console cable. The whole setup allows me to connect out-of-band to 16 other Cisco devices, so I will not have to plug console cables around ever again. How? Well, I could make a blog post about this, but it’s already explained quite perfectly here by David Davis.

I’ve been negotiating a long time for this (since mid-November), and was able to go under a third of the eBay listed prices for the separate components, so I’m very happy. I can now continue my studies.

Speaking of studies: my next study will be CCDA. I’m facing some uncertainties in my life right now and CCDA seems a more realistic short-term goal compared to CCIE. Not that I have abandoned the quest for Internetwork Expert, but the months to come may not give me enough time to properly study such a big project. Either way, CCDA continues the path, and I’ll be using my home lab to further refine my current CCNP skills. Still moving forward.

On popular demand: my home lab.

Yes, finally some pictures. Everybody kept asking me what my home lab looks like. I’m quite happy about it but it’s not perfect, so if you want to set your own lab, use my guide instead.

The pictures:

The first picture shows four 2611 routers and one Cisco Catalyst 2900XL in my rack. The 2611s have two 10 Mbps Ethernet interfaces, one (unused) ISDN interface and one DB60 serial interface. The 2611s run IOS version 12.3 IP Basic, which is the biggest IOS they can hold on their 8 MB flash. Which means support for RIP, OSPF, EIGRP, Frame Relay, NAT, Tunneling, HRSP, VRRP, GLBP, ACLs… But no IPv6.
The 2900XL is ancient: it has twenty-four 100 Mbps ports but runs an unofficial IOS version which just has basic spanning-tree, VLANs and a static port-channel. I tend to use it as a patch panel between the routers mostly.

The second picture shows my server on the bottom of the rack. It’s quite heavy and I don’t have a rack-mount kit. It’s an IBM xSeries 335 server: two 32-bits Xeon single cores clocked at 3.06 Ghz, 1.5 GB RAM, two SAS 36 GB 10k rpm hard drives (with hardware RAID support) and two 1 Gbps Ethernet interfaces. It used to run ESX with some virtualized Linux and Windows servers, but now it has become an Ubuntu with GNS3.

The third picture shows my laptop, an IP Phone, a Catalyst 2970, a Catalyst 3560 PoE and a 2503 router.
The laptop is a Pentium III 400 Mhz with a broken battery, no working USB port and no working wifi. But it has a COM port and provides console access without problem, as well as Wireshark and Putty for test through the 100 Mbps NIC.
The IP Phone is a Cisco 7912, powered by PoE. I have three of these now.
The WS-C2970-24T-E switch is my workhorse for heavy loads: twenty-four 1 Gbps ports with a 24 Gbps backbone, so it can support a full load at all ports at once. It’s IOS version 12.2, with support for MST, PVST, Rapid-PVST, LACP, PAgP, static port-aggregation, VLANs, VTP and security features like port security, DHCP Snooping, DAI and the like.
The WS-C3560-24P-S is a full layer 3 switch with twenty-four 100 Mbps ports and two 1 Gbps Small Form-Factor Pluggable Transceiver (SFP) ports. The 100 Mbps ports have Power over Ethernet (PoE) auto-detect. It has a more recent IOS 12.4 IP Services with crypto installed. At the time of writing, it’s still in production and supports all features of the 2970 plus layer 3 functionality like routing protocols, DHCP, IPv6, ACLs, and also QoS and Private VLANs. It even has a temperature sensor and auto-MDIX.
The 2503 router is the only survivor of a batch of five 2503s, which was my original CCNA home lab together with the 2900XL. It’s ancient, has two serial interfaces and one 10 Mbps Ethernet interface (with an AUI). It supports routing protocols in their basic configuration, NAT, ACLs. In reality, it reaches about 4 Mbps throughput in my lab, making it the only device I have with less throughput in Mbps than power consumption in Watts.

Not photographed: a WS-C3560-8PC-S. It’s an 8-ports 100 Mbps switch with a 1 Gbps uplink. Fanless, completely silent, but all functions of the 24-ports 3560, including PoE.