Tag Archive: Tracking


IPv4 anycast design.

Today, not a new protocol or device, but something else: an anycast design. In IPv6, an anycast address is an address that is in use by multiple devices. A packet destined for this address is sent to one of the devices only, either at random, load balanced, or by shortest distance (geographical location or metric). This article will show a design that does that in IPv4. The anycast service used will be DNS, but it can be anything, going from DNS, DHCP, NTP to protocols that use sessions like HTTP, although it does not provide any stateful failover in case a server fails.

AnycastDesign

The above design shows a head quarters (HQ) with two DNS servers, and a branch office with one local DNS server. In a ‘standard’ design, the three DNS servers would have three different IP’s, e.g. 10.0.2.2 and 10.0.2.3 in the HQ, and 10.1.2.2 in the branch office. The users in the HQ would have 10.0.2.2 as primary DNS and 10.0.2.3 as secondary, with perhaps some subnets doing vice versa to balance the load a bit. The branch office users would use the local 10.1.2.2 DNS as primary, and one of the HQ DNS as backup.

Such a design works and has redundancy, but has a few minor setbacks: in case a DNS fails, the users using it as a primary DNS will have to query the primary, which is down, followed by the secondary. This takes time and slows the systems. Also, a sudden failure of the branch office DNS may put a sudden high load on one of the HQ DNS, as most operating systems only have up to two DNS servers configured by default. The design can be enhanced by implementing anycast-like routing. This can be done in two ways: routed anycast and tracked anycast.

The routed anycast design requires that the DNS servers run a routing protocol, which is achieved easiest by using Unix/Linux operating systems. Each server has a loopback configured with an address like 10.10.10.10/32. Routing is enabled on the server and the address is advertised, for example by OSPF. Each server listens on the loopback for DNS requests, and management is done by the physical interface to still differentiate between DNS servers. The 10.10.10.10/32 is advertised by OSPF throughout the company. Even if it’s advertised from multiple locations, the only one used in each router is the one with the lowest metric, and thus the shortest distance.

The tracked anycast design works without the hosts running a routing protocol, but they still need the loopback with the IP address and routing configured. This makes it more accessible for any kind of server, including Windows. The routers are configured to track the state of the DNS servers using ping, or even by DNS requests. These tracking objects are then used to decide when a static route towards the loopback address of the DNS server is left in the routing table or not:

R1(config)#ip route 10.10.10.10 255.255.255.255 10.0.2.2 track 1

The route is then redistributed in the dynamic routing protocol. This design has several advantages:

  • You need just one IP address for the server and it’s a /32, which means it can be an easy to remember number that is equal in the entire company.
  • You no longer need to figure out which server is closest to the subnet, the routing protocols will do that for you.
  • In case two servers are on an equal distance, they will appear as two equal-cost routes in the routing table and load-balancing will automatically occur.
  • A failure of one of the servers is automatically corrected, and no hosts will have unreachable servers configured.
  • Because the management IP differs from the IP used for the service, it allows for better security or easier defined firewall rules.

The biggest drawbacks of the design are the extra configuration (though this is debatable since the anycast IP is easy to use), and convergence of the routing protocol may make the service temporarily unreachable, which means fine tuning of timers is required.

Advertisements

In an earlier post I already described how to set up IP SLA check a connection and how to bind it to a tracking object. Since then I’ve found out some differences between the IOS used on switches and the one used on routers. I used a hidden command ‘track 2 rtr 20’ to make a tracking object out of the IP SLA, but on a router, this command is not hidden. Also, on a layer 3 switch the IP SLA commands all start with ‘ip sla’, whereas on a router it’s ‘ip sla monitor‘. It may also be that my router is an older (EOL) device while my switch is relatively new. I guess I’ll have to know both to be sure.

I was also asked some interesting questions by Tomasz, and decided to expand the topic in a new post, describing some more options for these tracking objects. He also asked if it could be used as legal proof in case of a dispute. I’m not sure of it in case of my own ISP, but I think that on a corporate level, if enough details are mentioned in the contract, this may indeed be the case. After all, it’s an objective measurement of the connection.

So besides the simple logging, other things are possible. Take for example the following configuration:

The two routers have HRSP configured for the user subnet. Both routers have preempt configured. The left router has priority 200, the left router 190. The result is that in a normal situation, the left router will be the active router.

You can bind a tracking object to HRSP, for exampe, track the state of a connected interface. You can set that if an interface is down, priority of the router is decreased by 20. Okay, but is that really a safe solution? If both routers have a different number of connected interfaces, how do you determine which ones are important? And, say those interfaces stay up, but the problem occurs one hop away? It will go undetected, the link will stay up, packets will be sent through it, and dropped at the next hop.

Of course, you can also track the presence of certain routes in the routing table. This makes a lot more sense already: if the subnet can be reached, everything is fine, if it can’t, something is wrong, and it’s better that the other router takes the active role in that case. But it might still be insufficient for some types of data. There will be convergence by the routing protocols, which takes time. A network redesign or failure might change the route advertised into a more or less specific route (bigger or smaller subnet mask). E.g. a route to 10.0.1.0/24 is present in the routing table pointing to f0/1, the route is being tracked. Something fails in the network, the route is removed from the routing table, but a working default route pointing to f0/2 is still present. Despite connectivity the router will still lose HRSP active status! And what if the route still works but suddenly experiences very high latency, while the other HRSP router is working fine?

IP SLA is a perfect solution here. The only downside is that it takes a bit of CPU on the router and some bandwidth, but we’re talking about just a few packets. For example, we configure the IP SLA to track the state of an important production server (10.0.5.5) every 3 seconds using pings. If high latency (+100ms) is experienced or the ping fails, HRSP priority will decrease by 20.

Router(config)#ip sla monitor 50
Router(config-sla-monitor)#type echo protocol ipIcmpEcho 10.0.5.5
Router(config-sla-monitor-echo)#frequency 3
Router(config-sla-monitor-echo)#timeout 100
Router(config-sla-monitor-echo)#exit
Router(config)#track 5 rtr 50
Router(config-track)#exit
Router(config)#interface FastEthernet0/0
Router(config-if)#standby 1 preempt
Router(config-if)#standby 1 ip 192.168.0.1
Router(config-if)#standby 1 priority 200
Router(config-if)#standby 1 track 5 decrement 20
Router(config-if)#end

Note that you can track more than one object. So you can do this for multiple important points in the network. The router that can reach the most of these points with the least latency will have the active HRSP role. This ensures high availability of network services to users. Don’t forget the preempt command, otherwise the routers will not change active roles until one fails completely.

I(S)P SLA

Yesterday my internet connection went down, which has been happening often lately. Wanting to make sure it wasn’t the fault of any configuration on my side, I did several pingtests and concluded everything was fine up to the ISP gateway router I have, but no connection beyond that. I called the ISP, who informed me that they saw some disconnects but too little to make a case. I was asked to note down the time when a failure occurred, do this for a time period, and let them know if the situation became worse.

I suppose most of the readers have had a similar experience with an ISP once. I can’t blame them that they need more data but I would like a more stable connection. And I’m not home all the time to check the connection. But, with a Cisco 3560 switch running 24/7 in the network, I have a dedicated  device to track the internet connection status. For this, I will be using IP SLA. You can find a detailed guide of it on the Cisco website.

For a more detailed view of what is failing, I will be tracking two objects: an ICMP ping to the ISP gateway router at 192.168.0.1, and a DNS query to the ISP DNS server. This way I can tell whether the ISP device or the connection is failing. I opted for a DNS query because sometimes pings still work while websites don’t (I’m not sure why, perhaps related to the small MTU). But before configuring this, I’m going to set the time on my switch, because otherwise all logged failures are useless.

The time can be set locally using the ‘clock set’ command, but it would not be entirely accurate and time would be lost after a switch reboot, so NTP is a better option. After finding a regional NTP server, I configure the commands:

Switch(config)#clock timezone CET -1
Switch(config)#clock summer-time CDT recurring
Switch(config)#ntp server 193.190.230.66

The timezone (Central European Time) is needed because a NTP uses Coordinated Universal Time (UTC). Cisco IOS supports summer-time, so I add that too. A ‘show clock’ reveals the correct time. Now to the configuration of the IP SLA:

Switch(config)#ip sla 20
Switch(config-ip-sla)#icmp-echo IP
Switch(config-ip-sla-echo)#frequency 60
Switch(config-ip-sla-echo)#timeout 100
Switch(config-ip-sla-echo)#exit
Switch(config)#ip sla 30
Switch(config-ip-sla)#dns google.com name-server 8.8.8.8
Switch(config-ip-sla-echo)#frequency 60
Switch(config-ip-sla-echo)#timeout 500

I used my ISP’s DNS, the one used here is Google’s. Frequency is set to 60 seconds, and time-out in milliseconds is added, so the IP SLA can be triggered by a slow connection too. This is the correct configuration but it still needs to be activated to work. Since I want it running all the time, I make it forever:

Switch(config)#ip sla schedule 20 start-time now life forever
Switch(config)#ip sla schedule 30 start-time now life forever

Once it is running, you can check the IP SLA state with ‘show ip sla statistics’. Normally one would set up a SNMP client that will monitor the state of IP SLA and gather alerts, but because that requires a computer to be powered on all the time, I’m going to bind it to the local syslog. Keep in mind that this is not a good practice on a production switch: there’s only a small logging buffer present and information is lost on reboot. To do this, I use a hidden command, not visible with ‘?’:

Switch(config)#track 2 rtr 20
Switch(config-track)#end
Switch(config)#track 3 rtr 30
Switch(config-track)#end

The ‘rtr’ part makes the IP SLA a tracked object. This generates a notification (severity 5 syslog trap) every time an IP SLA changes state. Using the ‘show logging | include TRACK’ command you can see when failures occurred. And now, we wait.

Update September 25, 2011: I didn’t have to wait long, I had a random disconnect yesterday, triggering the DNS check IP SLA. The ping to the modem didn’t fail, so the modem has had no downtime so far.
IP SLA