Category: Network


This article further continues on earlier experiments. While using internal tunnels gave a logically ‘simple’ point-to-point network seen from a layer 3 point of view, it came with the drawback of complex header calculations, resulting in CPU hogging on devices capable of hardware switching. Using some route-maps to choose VRFs for flows proved interesting, but only in particular use cases. It didn’t make the troubleshooting any easier.

There is a better method for dealing with inter-VRF routing on a local device, but it requires a very different logic. Up until now, my articles described how to be able to select a next-hop inside another VRF. Route leaking using BGP, however, leverages a features of BGP that has been described in my blog before: the import and export of routing information in different VRFs.

How does it work? Well, since the BGP process can’t differentiate between VRFs based on names, but instead uses route-targets to uniquely identify routing tables, it’s the route target that matters. Usually, for MPLS-VPN, these numbers differ per VRF. This time, use the same number for two VRFs:

VRF-RouteLeaking1This will tell BGP later on to use the same routing table for both VRFs. Now if you configure BGP…

VRF-RouteLeaking2

… There’s still a separate configuration per VRF. How does the exchange happen? Well: for all routes that are known inside the BGP process. This means any routes from any BGP peer in one of the VRFs are automatically learned in both VRFs. But for static and connected routes, and routes from other routing protocols, you can do a controlled redistribution using route-maps or prefix lists. This way only the routes that need to be known in the other VRF are added.

VRF-RouteLeaking3

The routing table with automatically show routes pointing to different VRFs. On top of that, the forwarding does not require any headers to be added or removed to the packets, so it can be put directly into CEF, allowing for hardware forwarding. And for completeness: the BGP neighborships here are just for illustration of what happens, but no neighborship is required for this to function. BGP just needs to run on the device as a process.

Of course, this is just a simple setup. More complexity can always be added, as you can even use route-maps to set the route targets for some routes.

This article is not really written with knowledge usable for a production network in mind. It’s more of an “I have not failed. I’ve just found 10,000 ways that won’t work.” kind of article.

I’m currently in a mailing group with fellow network engineers who are setting up GRE tunnels to each others home networks over the public internet. Over those networks we speak (external) BGP towards each other and each engineer announces his own private address range. With around 10 engineers so far and a partial mesh of tunnels, it gives a useful topology to troubleshoot and experiment with. Just like the real internet, you don’t know what happens day-to-day, neighborships may go down or suddenly new ones are added, and other next-hops may become more interesting for some routes suddenly.

SwitchRouting1

But of course it requires a device at home capable of both GRE and BGP. A Cisco router will do, as will Linux with Quagga and many other industrial routers. But the only device I currently have running 24/7 is my WS-C3560-8PC switch. Although it has an IP Services IOS, is already routing and can do GRE and BGP, it doesn’t do NAT. Easy enough: allow GRE through on the router that does the NAT in the home network. Turns out the old DD-WRT version I have on my current router doesn’t support it. Sure I can replace it but it would cost me a new router and it would not be a challenge.

SwitchRouting2

Solution: give the switch a direct public IP address and do the tunnels from there. After all, the internal IP addresses are encapsulated in GRE for transport so NAT is required for them. Since the switch already has a default route towards the router, set up host routes (a /32) per remote GRE endpoint. However, this still introduces asymmetric routing: the provider subnet is a connected subnet for the switch, so incoming traffic will go through the router and outgoing directly from the switch to the internet without NAT. Of course that will not work.

SwitchRouting3

So yet another problem to work around. This can be solved for a large part using Policy-Based Routing (PBR): on the client VLAN interface, redirect all traffic not meant for a private range towards the router. But again, this has complications: the routing table does not reflect the actual routing being done, more administrative overhead, and all packets originated from the local switch will still follow the default (the 3560 switch does not support PBR for locally generated packets).

Next idea: it would be nice to have an extra device that can do GRE and BGP directly towards the internet and my switch can route private range packets towards it. But the constraint is no new device. So that brings me to VRFs: split the current 3560 switch in two: one routing table for the internal routing (vrf MAIN), one for the GRE tunnels (vrf BGP). However, to connect the two VRFs on the same physical device I would need to loop a cable from one switchport to another, and I only have 8 ports. The rest would work out fine: point private ranges from a VLAN interface in one VRF to a next-hop VLAN interface over that cable in another VRF. That second VRF can have a default route towards the internet and set up GRE tunnels. The two VRFs would share one subnet.

SwitchRouting4

Since I don’t want to deal with that extra cable, would it be possible to route between VRFs internally? I’ve tried similar actions before, but those required a route-map and a physical incoming interface. I might as well use PBR if I go that way. Internal interfaces for routing between VRFs exist on ASR series, but not my simple 8-port 3560. But what if I replace the cable with tunnel interfaces? Is it possible to put both endpoints in different VRFs? Yes, the 15.0(2) IOS supports it!

SwitchRouting5

The tunnel interfaces have two commands that are useful for this:

  • vrf definition : just like on any other layer 3 interface, it specifies the routing table of the packets in the interface (in the tunnel).
  • tunnel vrf :  specifies the underlying VRF from which the packets will be sent, after GRE encapsulation.

With these two commands, it’s possible to have tunnels in one VRF transporting packets for another VRF. The concept is vaguely similar to MPLS-VPN,  where your intermediate (provider) routers only have one routing table which is used to transport packets towards routers that have the VRF-awareness (provider-edge).

interface Vlan2
ip address 192.168.2.1 255.255.255.0
interface Vlan3
ip address 192.168.3.1 255.255.255.0
interface Tunnel75
vrf forwarding MAIN
ip address 192.168.7.5 255.255.255.252
tunnel source Vlan2
tunnel destination 192.168.3.1
interface Tunnel76
vrf forwarding BGP
ip address 192.168.7.6 255.255.255.252
tunnel source Vlan3
tunnel destination 192.168.2.1

So I configure two tunnel interfaces, both in the main routing table. Source and destination are two IP addresses locally configured on the router.  I chose VLAN interface, loopbacks will likely work as well. Inside the tunnels, one is set to the first VRF, the other to the second. One of the VRFs may be shared with the main (outside tunnels) routing table, but it’s not a requirement. Configure both tunnel interfaces as two sides of a point-to-point connection and they come up. Ping works, and even MTU 1500 works over the tunnels, despite the show interface command showing an MTU of only 1476!

Next, I set up BGP to be VRF-aware. Logically, there are two ‘routers’, one of which is the endpoint for the GRE tunnels, and another one which connects to it behind it for internal routing. Normally if it were two physical routers, I would set up internal BGP between them since I’m already using that protocol. But there’s no difference here: you can make the VRFs speak BGP to each other using one single configuration.

router bgp 65000
address-family ipv4 vrf MAIN
neighbor 192.168.7.6 remote-as 65000
network 192.168.0.0 mask 255.255.248.0
neighbor 192.168.7.6 activate
exit-address-family
address-family ipv4 vrf BGP
bgp router-id 192.168.7.6
neighbor 192.168.7.5 remote-as 65000
neighbor 192.168.7.5 activate
exit-address-family

A few points did surface: you need to specify the neighbors (the IP addresses of the local device in the different VRFs) under the correct address families. You also need to specify a route distinguisher under the VRF as it is required for VRF-aware BGP. And maybe the most ironic: you need a bgp router-id set inside the VRF address-family so it differs from the other VRF (the highest interface IP address by default), otherwise the two ‘BGP peers’ will notice the duplicate router-id and it will not work. But after all of that, BGP comes up and routes are exchanged between the two VRFs! For the GRE tunnels towards the internet, the tunnel vrf command is required in the GRE tunnels so they use the correct routing table for routing over the internet.

So what makes this not production-worthy? The software-switching.

The ASIC can only do a set number of actions in a certain sequence without punting towards the switch CPU. Doing a layer 2 CAM table lookup or a layer 3 RIB lookup is one thing. But receiving a packet, have the RIB pointing it to a GRE tunnel, encapsulate, decapsulate and RIB lookup of another VRF is too much. It follows the expected steps in the code accordingly, the IOS software does not ‘see’ what the point is and does not take shortcuts. GRE headers are actually calculated for each packet traversing the ‘internal tunnel’ link. I’ve done a stress test and the CPU would max out at 100% at… 700 kBps, about 5,6 Mbps. So while this is a very interesting configuration and it gives an ideal situation to learn more, it’s just lab stuff.

So that’s the lesson, as stated in the beginning: how not to do it. Can you route between VRFs internally on a Cisco switch or router (not including ASR series)? Yes. Would you want to do it? No!

Disclaimer: the logs are taken from a production network but the values (VLAN ID, names) are randomized.

Recently, I encountered an issue on a Campus LAN while performing routine checks: spanning tree seemed to undergo regular changes.

The LAN in question uses five VLANs and RPVST+, a Cisco-only LAN. At first sight there was no issue:

BPDU-Trace1

On an access switch, one Root port towards the Root bridge, a few Designated ports (note the P2P Edge) where end devices connect, and an Alternative port in blocking with a peer-to-peer neighborship, which means BPDUs are received on this link.

There is a command that allows you to see more detail: ‘show spanning-tree detail’. However, the output from this command is overwhelming so it’s best to apply filters on it. After some experimenting, filtering on the keywords ‘from’,’executing’ and ‘changes’ seems to give the desired output:

BPDU-Trace2

This gives a clear indication of something happening in the LAN: VLAN 302 has had a spanning-tree event less than 2 hours ago. Compared to most other VLANs who did not change for almost a year, this means something changed recently. After checking the interface on which the event happened, I found a port towards a desk which did not have the BPDU Guard enabled, just Root Guard. It was revealed that someone regularly plugged in a switch to have more ports, which talked spanning-tree but with a default priority, not claiming root. As such, Root Guard was not triggered, but the third-party switch did participate in spanning-tree from time to time.

Also, as you notice in the image, VLAN 304 seemed to have had a recent event to, on the Alternative Port. After logging in on the next switch I got the following output:

BPDU-Trace3

Good part: we have a next interface to track. Bad news: it’s a stack port? Since it’s a stack of 3750 series switches, it has stack ports in use, but the switches should act as one logical unit in regards to management and spanning-tree, right? Well, this is true, but each switch still makes the spanning-tree calculation by itself, which means that it can receive a spanning-tree update of another switch in the stack through the stack port.

Okay, but can you still trace this? You would have to be able to look in another switch in the stack… And as it turns out, you can:

BPDU-Trace4

After checking which stack port connects to which switch in the CLI, you can hop to the next one with the ‘session’ command and return to the master switch simply by typing ‘exit’. On the stack member, again doing the ‘show spanning tree detail’ command shows the local port on which the most recent event happened. And again the same thing: no BPDU Guard enabled here.

 

Just a simple article about something I recently did in my home network. I wanted to prepare the network for a Squid proxy, and design it in such a way that the client devices did not require proxy settings. Having trouble placing it inline, I decided I could use WCCP. However, that requires separate VLANs.

This did pose a problem: my home router did not support any kind of routing and multiple networks beyond a simple hide NAT (PAT) behind the public IP address. Even static routes weren’t possible.

And again my fanless 3560-8PC helped me out. The 3560 can do layer 3 so you can configure it with the proper VLANs and use it as the default gateway on all VLANs. Then you add another VLAN towards the router and point a default route towards that router.

That solves half of the problem: packets get to the router and out to the internet. However, the router does not have a return route for the VLANs. But it does not need that: you can use Proxy ARP. As the router will use a /24 subnet, you can subnet all VLANs inside that /24, e.g. a few /26 and a /30 for the VLAN towards the router, as my home network will not grow beyond a dozen devices in total. Now the router will send an ARP request for each inside IP address, after which the layer 3 switch answers on behalf of the client device. The router will forward all data to the layer 3 switch, who knows all devices in the connected subnets.

ProxyARP

And problem solved. From the point of view of the router, there’s one device (MAC address, the layer 3 switch) in the entire subnet that uses a bunch of IP addresses.

And no FabricPath either. This one works without any active protocol involved, and no blocked links. Too good to be true? Of course!

LAN-NoSTP

Take the above example design: three switches connected by port channels. Let’s assume users connect to these switches with desktops.

Using a normal design, spanning tree would be configured (MST, RPVST+, you pick) and one of the three port-channel links would go into blocking. The root switch would be the one connecting to the rest of the network or a WAN uplink, assuming you set bridge priorities right.

Easy enough. And it would work. Any user in a VLAN would be able to reach another user on another switch in the same VLAN. They would always have to pass through the root switch though, either by being connected to it, or because spanning tree blocks the direct link between the non-root switches.

Disabling spanning-tree would make all links active. And a loop would definitely follow. However, wouldn’t it be nice if a switch would not forward a frame received from another switch to other switches? This would require some sort of split horizon, which VMware vSwitches already do: if a frame enters from a pNIC (physical NIC) it will not be sent out another pNIC again, preventing the vSwitch from becoming a transit switch. Turns out this split horizon functionality exists on a Cisco switch: ‘switchport protect’ on the interface, which will prevent any frames from being sent out that came in through another port with the same command.

Configuring it on the port channels on all three switches without disabling spanning tree proves the point: the two non-root switches can’t reach each other anymore because the root switch does not forward frames between the port channels. But disabling spanning tree after it creates a working situation: all switches can reach each other directly! No loops are formed because no switch forwards between the port channels.

Result: a working network with active-active links and optimal bandwidth usage. So what are the caveats? Well…

  • It doesn’t scale: you need a full mesh between switches. Three inter-switch links for three switches, six for four switches, ten for five switches,… After a few extra switches, you’ll run out of ports just for the uplinks.
  • Any link failure breaks the network: if the link between two switches is broken, those two switches will not be able to reach each other anymore. This is why my example uses port-channels: as long as one link is active it will work. But there will not be any failover to a path with better bandwidth.

Again a disclaimer, I don’t recommend it in any production environment. And I’m sure someone will (or already has) ignore(d) this.

IS-IS part II: areas and backbone.

Given the basics covered in part I, IS-IS configuration isn’t that hard. It already clearly shows some differences with OSPF, but it’s when using multiple areas that there is a clear distinction in logic.

OSPF-Areas

First a small recap of OSPF areas: you have a backbone area, area 0, to which all other areas must connect. A router can be in multiple areas, an interface can be in only one area for a given OSPF process. Routes between areas are known by default, but setting an area to stub can change this to just a default route.

ISIS-Areas

IS-IS is different: as you may have guessed by the ‘net’ command of part I, a router can only be part of one area. Area borders are between routers. An area is made up of routers with level 1 neighborships. A router with a level 2 neighborship towards another router is considered a backbone router. Since level 2 neighborships can be between routers in different areas (the second part of ‘net’ command can differ), these routers connect areas.

The moment a router has a level 2 neighborship and becomes a backbone router, it will automatically propagate a default route towards its level 1 neighbors. This gets flooded throughout the area. To reach another area, packets will be sent automatically towards the nearest backbone router. The Backbone router has a second topology table for level 2 that lists information of all subnets in all areas (which requires more memory). The packet will then be transported over the backbone to the appropriate area. For this reason, the backbone must be continuous: otherwise there would be multiple islands of routers propagating default routes.

From that point of view, the level 2 backbone becomes an overlay on top of the areas that connects everything: an extra ‘level’, likely the reason for the terminology. While this design works and is very scalable it may introduce suboptimal routing. Inter-area traffic will go to the nearest backbone router, but there may be other backbone routers in the area that can route the packets to the destination in a better way. For example, in the above image, the bottom router in the purple middle area may decide to follow the default route to the left backbone router for a packet destined for the right blue area.

Configuration is still straightforward:

Router(config)#interface GigabitEthernet0/1
Router(config-int)#ip address 10.0.2.5 255.255.255.252
Router(config-int)#ip router isis
Router(config-int)#isis circuit-type level-1
Router(config-int)#exit
Router(config)#interface GigabitEthernet0/2
Router(config-int)#ip address 10.0.3.1 255.255.255.252
Router(config-int)#ip router isis
Router(config-int)#isis circuit-type level-2-only
Router(config-int)#exit
Router(config)#interface GigabitEthernet0/3
Router(config-int)#ip address 10.0.2.9 255.255.255.252
Router(config-int)#ip router isis
Router(config-int)#
Router(config-int)#exit
Router(config)#router isis
Router(config-router)#log-adjacency-changes
Router(config-router)#net 49.0001.0000.0000.0008.00

This example configures a router for a level 1 neighborship on Gi0/1 (inside the area), a level 2 neighborship on Gi0/2 (between areas) and a level 1 & 2 neighborship on Gi0/3 (inside the area, but still backbone). Note the missing ‘is-type’ command in the routing process, which makes the router default to both a level 1 and level 2 router. A router in another area has a different area number in the net command:

Router(config)#interface GigabitEthernet0/2
Router(config-int)#ip address 10.0.3.2 255.255.255.252
Router(config-int)#ip router isis
Router(config-int)#isis circuit-type level-2-only
Router(config-int)#exit
Router(config)#router isis
Router(config-router)#log-adjacency-changes
Router(config-router)#net 49.0002.0000.0000.0009.00

Note that an IS-IS router is not required to have a level 1 neighborship. It is possible to have a ‘pure’ backbone router with only level 2 neighborships, which makes the router only use one topology table again, just like a level 1-only router.

The topology tables for both levels can be checked with show isis topology l1 and show isis topology l2. Same for the database, just replace the word ‘topology’ with ‘database’. The show clns is-neighbors and show isis neighbors commands both show all IS-IS neighbors and the level of the neighborship.

IS-IS, or Intermediate System to Intermediate System. Just like OSPF, it’s a link-state routing protocol. This article took me quite a bit of research, and things were confusing for me at first because I kept looking at it from an OSPF point of view. Now that I’ve cleared that up for myself, I’ll do my best to explain it here for people knowing OSPF but not IS-IS (which, I assume, will be the majority of readers here).

First some explanation about why one would want to use IS-IS in the first place. After all, both are link-state routing protocols and OSPF is much more familiar to most. However, there are a few key differences in design of the protocols. But the most important reason to choose IS-IS over OSPF is scalability. IS-IS scales to larger topologies compared to OSPF using the same resources. A general recommendation for the number of OSPF routers in an area is between 70 and 100 maximum, while IS-IS will do 150 routers in an area (of course, the number of uplinks, routes and type of routers will influence this number). The difference in multi-area design can also make IS-IS more suitable for some topologies (which I will explain in part II later on).

This part will focus on a single area and basic configuration. It is useful to know some historical facts which explain the difference in commands compared to OSPF.

  • Since IS-IS wasn’t designed with IP in mind but CLNS, it works directly on layer 2 with no IP headers. It uses flexible TLV (Type-Length-Value) fields in the PDUs it exchanges which makes it suitable for carrying routing information of just about any protocol. This is why it’s also used for IPv6 and even TRILL and FabricPath (which is actually nothing more than exchanging the location of MAC addresses by routing protocol).
  • IS-IS has a concept of areas but refers to it as ‘levels’. On a Cisco router the IS-IS routing protocol will try to form neighborships for both level 1 and level 2 by default. When using just one area, it’s best to configure the routing protocol to form neighborship of level 1 only (again, multi-area will be covered in part II).
  • A Network Entitity Title (NET) is used to identify a router. It is made up of four parts: the first byte is an Authority and Format Identifier (AFI),  next two bytes that define the area, followed by six bytes that act as a unique identifier (much like an OSPF router-id) and one byte for n-selector (NSEL). This NSEL is always set to zero for IS-IS for IP (non-zero values are used for actual data transport over CLNS, which likely isn’t used anywhere anymore). The AFI must be officially registered but 49 can be used for internal addressing.
  • As a consequence, the first six bytes (AFI and area ID) have to be the same for all IS-IS routers in an area, and the following six bytes have to be unique for each IS-IS router in an area.
  • For the unique ID part, several methods exist: you can use the system base MAC address, map an IP address to it, or simply start counting from 1 and up.

ISIS-NET

Given all the above, the basic IS-IS routing process can be configured as following:

Router(config)#router isis
Router(config-router)#log-adjacency-changes
Router(config-router)#is-type level-1
Router(config-router)#net 49.0001.0000.0000.0017.00

Unlike the other routing protocols, logging of adjacencies is not on by default on a Cisco router.

Now that the process is configured, interfaces must be added to it. That’s right, interfaces, no ‘network’ command to define subnets. This can be done in two ways:

  • Configuring an IP address on an interface, followed by the ‘ip router isis’ command will make the interface participate.
  • Configuring an IP address on an interface and defining that interface as passive in the router process will make IS-IS announce the subnet on the attached interface but not form any neighborships on it. The ‘ip router isis’ command is not required.

Router(config)#interface GigabitEthernet0/1
Router(config-int)#ip address 10.0.2.1 255.255.255.252
Router(config-int)#ip router isis
Router(config-int)#exit
Router(config)#interface Loopback0
Router(config-int)#ip address 10.0.10.14 255.255.255.255
Router(config-int)#exit
Router(config)#router isis
Router(config-router)#passive-interface Loopback0

And that’s it. Configure this on two adjacent routers and an IS-IS neighborship will form. You can check this using ‘show clns neighbors’ and ‘show isis neighbors’.

ISIS-Show

In upcoming parts, I’ll explain multi-area design and configuration and fine tuning of the default parameters. And for those interested, I’ve uploaded a capture of the IS-IS neighborship forming on Cloudshark.

On this blog I’ve often covered Private VLANs: how to configure them, work around them and deploy them in a larger network. Yet it’s rarely that you see an actual Private VLAN in a design. Part of the problem is covered in the article about deployment over multiple switches: you can’t connect a trunked device such as a firewall to it. Although the Nexus 7000 provides a solution, that doesn’t make it much easier (or cheaper).

Another important reason is that few are willing to take the risk to deploy a VLAN where hosts cannot communicate with each other, as this is usually the reason hosts are put in the same VLAN in the first place. There’s the hesitation because it would introduce complexity or limit scalability, as new servers later on may need to communicate in the same subnet after all.

So where would it be beneficial and with low risk to use a Private VLAN? Actually quite a few places.

E-commerce
AppFlowDMZ

Say you have an internet-facing business with e-commerce websites where anyone can log in, create an account, or do a purchase. A compromised e-commerce server in the DMZ means immediate access to the entire DMZ VLAN. This VLAN has the highest chance of being compromised from the internet, yet the servers in it rarely need to speak with each other. In fact, if properly designed, they will all connect to backend application and/or database servers that on their turn communicate with each other. This way the e-commerce data is synchronised without the DMZ servers requiring a connection to each other.

Stepping Stones
SteppingStones

Some environments have a VLAN with Stepping Stone servers where users can log on to with pre-installed tools to access confidential resources. Access from one Stepping Stone server to another is not needed here. Sometimes it’s even not desired as there may be a Stepping Stone per application, environment or third-party.

Out-of-Band
A modern rackserver has an out-of-band port to a dedicated chip in the server that can power off and on the server, and even install the OS remotely. For example, HP iLO. Typical here is that the out-of-band port never initiates connections but only receives connections for management, usually though the default gateway. This makes for a good Private VLAN deployment without issues.

Backup
BackupVLAN

Similar to out-of-band, some environments use a dedicated network card on all servers for backup. This introduces a security issue as it’s possible for two servers in different VLANs to communicate without a firewall in between. Again a Private VLAN can counter this. Somewhat unusual in the design is that it’s best to put the servers taking the backup in the promiscuous VLAN, so they can communicate with all servers and the backup VLAN default gateway, and put the default gateway in an isolated VLAN, preventing any other server from using it.

Campus – Wired guests
Similar to the Stepping Stones: guests can access the network through a firewall (the default gateway) but don’t need to access each others computers.

Campus – Wireless APs
In a WLAN deployment with a central controller (WLC), all the Lightweight APs do is connect to the controller using the subnet default gateway. Any other services such as DHCP and DNS will be through this default gateway as well.

Campus – Utilities
Utilities such as printers, camera’s, badge readers,… will likely only need the default gateway and not each other.

Where not to use PVLANs
This should give some nice examples already. But for last, a couple of places where not to use Private VLANs:

  • Routing VLANs: unless you want to troubleshoot neighborships not coming up.
  • VLANs with any kind of cluster in it: still doable with community VLANs for the cluster synchronisation, but usually better off in their own VLAN.
  • User VLANs, VOIP VLANs and the like: VOIP and videoconferencing may set up point-to-point streams.
  • Database server VLANs: not really clusters but they will often require access to each other.

Advantages of MPLS: an example.

While MPLS is already explained on this blog, I often still get questions regarding the advantages over normal routing. A clear example I’ve also already discussed, but besides VRF awareness and routing of overlapping IP ranges, there’s also the advantage of reduced resources required (and thus scalability).

WANEdgeDesign

Given the above design: two routers connecting towards ISPs using eBGP sessions. These in turn connect to two enterprise routers, and those two enterprise routers connect towards two backend routers closer to (or in) the network core. All routers run a dynamic routing protocol (e.g. OSPF) and see each other and their loopbacks. However, the two middle routers in the design don’t have the resources to run a full BGP table so the WAN edge routers have iBGP sessions with the backend routers near the network core.

If you configure this as described and don’t add any additional configuration, this design will not work. The iBGP sessions will come up and exchange routes, but those routes will list the WAN edge router as the next hop. Since this next hop is not on a directly connected subnet to the backend routers, the received routes will not be installed in the routing table. The enterprise routers would not have any idea what to do with the packets anyway.

Update January 17th, 2014: the real reason a route will not be installed in the routing table is the iBGP synchronisation feature, which requires the IGP to have learned the BGP routes through redistribution before using the route. Still, synchronisation can be turned off and the two enterprise routers would drop the packets they receive.

There are a few workarounds to make this work:

  • Just propagating a default route of course, but since the WAN edge routers are not directly connected to each other and do not have an iBGP session, this makes the eBGP sessions useless. Some flows will go through one router, some through the other. This is not related to the best AS path, but to the internal (OSPF) routing.
  • Tunneling over the middle enterprise routers, e.g. GRE tunnels from the WAN edge routers towards the backend routers. Will work but requires multiple tunnels with little scalability and more complex troubleshooting.
  • Replacing the middle enterprise routers by switches so it becomes layer 2 and the WAN edge and backend routers have a directly connected subnet. Again this will work but requires design changes and introduces an extra level of troubleshooting (spanning tree).

So what if MPLS is added to the mix? By adding MPLS to these 6 routers (‘mpls ip’ on the interfaces and you’re set), LDP sessions will form… After which the backend routers will install all BGP routes in their routing tables!

The reason? LDP will advertise a label for each prefix in the internal network (check with ‘show mpls ldp bindings’) and a label will be learned for the interfaces (loopback) of the WAN edge routers… After which the backend routers know they just have to send the packets towards the enterprise routers with the corresponding MPLS label.

And the enterprise routers? They have MPLS enabled on all interfaces and no longer use the routing table or FIB (Forwarding Information Base) for forwarding, but the LFIB (Label Forwarding Information Base). Since all packets received from the backend routers have a label which corresponds to the loopback of one of the WAN edge routers, they will forward the packet based on the label.

Result: the middle enterprise routers do not need to learn any external routes. This design is scalable and flexible, just adding a new router and configuring OSPF and MPLS will make it work. Since a full BGP table these days is well over 450,000 routes, the enterprise routers do not need to check a huge routing table for each packet which decreases resource usage (memory, CPU) and decreases latency and jitter.

I have to admit, this article will sound a bit like an advertisement. But given that Cisco has gotten enough attention on this blog already, it can only bring variation into the mix.

A short explanation of a series of different products offered by F5 Networks. Why? If you’re a returning reader to this blog and work in the network industry, chances are you’ll either have encountered one of these appliances already, or could use them (or another vendor’s equivalent of course).

F5-LTM

LTM
The Local Traffic Manager’s main function is load balancing. This means it can divide incoming connections over multiple servers.
Why you would want this:
A typical web server will scale up to a few hundred or thousand connections, depending on the hardware and services it is running and presenting. But there may be more connections needed than one server can handle. Load balancing allows for scalability.
Some extra goodies that come with it:

  • Load balancing method: of course you can choose how to divide the connections. Simply round-robin, weighted in favor of a better server that can handle more, always to the server with the least connections,…
  • SSL Offloading: the LTM can provide the encryption for HTTPS websites and forward the connections in plain HTTP to the web servers, so they don’t have to consume CPU time for encryption.
  • OneConnect: instead of simply forwarding each connection to the servers in the load balancing pool, the LTM can set up a TCP connection with each server and reuse it for every incoming connection, e.g. a new HTTP GET for each external connection over the same inbound connection. Just like SSL Offloading, it consumes fewer resources on the servers. (Not every website handles this well.)
  • Port translation: not really NAT but you can configure the LTM for listening on port 80 HTTP or 443 HTTPS while the servers have their webpage running on different ports.
  • Health checks: if one of the servers in the pool fails, the LTM can detect this and stop sending connections to the server. The service or website will stay up, it will just be able to accept fewer resources. You can even upgrade servers one by one without downtime for the website (but make sure to plan this properly).
  • IPv6 to IPv4 translation: your web servers and entire network does not have to be IPv6 capable. Just the network up to the LTM has to be.

F5-ASM

ASM
The Application Security Manager can be placed in front of servers (one server per external IP address) and functions as an IPS.
Why you would want this:
If you have a server reachable from the internet, it is vulnerable to attack. Simple as that. Even internal services can be attacked.
Some extra goodies that come with it:

  • SSL Offloading: the ASM can provide the encryption for HTTPS websites just like the LTM. The benefit here is that you can check for attack vectors inside the encrypted session.
  • Automated requests recognition: scanning tools can be recognized and prevented access to the website or service.
  • Geolocation blocks: it’s possible to block out entire countries with automatic lists of IP ranges. This way you can offer the service only where you want it, or stop certain untrusted regions from connecting.

GTM
The Global Traffic Manager is a DNS forwarding service that can handle many requests at the same time with some custom features.
Why you would want this:
This one isn’t useful if the service you’re offering isn’t spread out over multiple data centers in geographically different regions. If it is, it will help redirect traffic to the nearest data center and provide some DDoS resistance too.
Some extra goodies that come with it:

  • DNSSec: secured DNS support which prevents spoofing.
  • Location-based DNS: by matching the DNS request with a list of geographical IP allocations, the DNS reply will contain an A record (or AAAA record) that points to the nearest data center.
  • Caching: the GTM also caches DNS requests to respond faster.
  • DDoS proof: automated DNS floods are detected and prevented.

F5-APM

APM
The Access Policy Manager is a device that provides SSLVPN services.
Why you would want this:
The APM will connect remote devices with encryption to the corporate network with a lot of security options.
Some extra goodies that come with it:

  • SSLVPN: no technical knowledge required for the remote user and works over SSL (TCP 443) so there’s a low chance of firewalls blocking it.
  • SSO: Single Sign On support. Log on to the VPN and credentials for other services (e.g. Remote Desktop) are automatically supplied.
  • AAA: lots of different authentication options, local, Radius, third-party,…
  • Application publishing: instead of opening a tunnel, the APM can publish applications after the login page (e.g. Remote Desktop, Citrix) that open directly.

So what benefit would you have from knowing this? More than you think: many times when a network or service is designed, no attention is given to these components. Yet they can help scale out a service without resorting to complex solutions.