Another series of articles. So far in my blog, I’ve concentrated on how to get routed networks running with basic configuration. But at some point, you may want to refine the configuration to provide better security, better failover, less chance for unexpected issues, and if possible make things less CPU and memory intensive as well.
While I was recently designing and implementing a MPLS network, it got clear that using defaults everywhere wasn’t the best way to proceed. As visible in the MPLS-VPN article, several different protocols are used: BGP, OSPF and LDP. Each of these establishes a neighborship with the next-hop, all using different hello timers to detect issues: 60 seconds for BGP, 10 seconds for OSPF and 5 seconds for LDP.
First thing that comes to mind is synchronizing these timers, e.g. setting them all to 5 seconds and a 15 second dead-time. While this does improve failover, there’s three keepalives going over the link to check if the link works, and still several seconds of failover. It would be better to bind all these protocols to one common keepalive. UDLD comes to mind, but that’s to check fibers if they work in both directions, it needs seconds to detect a link failure, and only works between two adjacent layer 2 interfaces. The ideal solution would check layer 3 connectivity between two routing protocol neighbors, regardless of switched path in between. This would be useful for WAN links, where fiber signal (the laser) tends to stay active even if there’s a failure in the provider network.
Turns out this is possible: Bidirectional Forwarding Detection (BFD) can do this. BFD is an open-vendor protocol (RFC 5880) that establishes a session between two layer 3 devices and periodically sends hello packets or keepalives. If the packets are no longer received, the connection is considered down. Configuration is fairly straightforward:
Router(config-if)#bfd interval 50 min_rx 50 multiplier 3
The values used above are all minimum values. The first 50 for ‘interval’ is how much time in milliseconds there is between hello packets. The ‘mix_rx’ is the expected receive rate for hello packets. Documentation isn’t clear on this and I was unable to see a difference in reaction in my tests if this parameter was changed. The ‘multiplier’ value is how many hello packets kan be missed before flagging the connection as down. The above configuration will trigger a connection issue after 150 ms. The configuration needs to be applied on the remote interface as well, but that will not yet activate BFD. It needs to be attached to a routing process on both sides before it starts to function. It will take neighbors from those routing processes to communicate with. Below I’m listing the commands for OSPF, EIGRP and BGP:
Router(config)#router ospf 1
Router(config-router)#bfd all-interfaces
Router(config-router)#exit
Router(config)#router eigrp 65000
Router(config-router)#bfd all-interfaces
Router(config-router)#exit
Router(config)#router bgp 65000
Router(config-router)#neighbor fall-over bfd
Router(config-router)#exit
This makes the routing protocols much more responsive to link failures. For MPLS, the LDP session cannot be coupled with BFD on a Cisco device, but on a Juniper it’s possible. This is not mandatory as the no frames will be sent on the link anymore as soon as the routing protocol neighborships break and the routing table (well, the FIB) is updated.
Result: fast failover, relying on a dedicated protocol rather than some out-of-date default timers:
Router#show bfd neighbor
NeighAddr LD/RD RH/RS State Int
10.10.10.1 1/1 Up Up Fa0/1Jun 29 14:16:21.148: %OSPF-5-ADJCHG: Process 1, Nbr 10.10.10.1 on FastEthernet0/1 from FULL to DOWN, Neighbor Down: BFD node down
Not bad for a WAN line.