Yesterday my internet connection went down, which has been happening often lately. Wanting to make sure it wasn’t the fault of any configuration on my side, I did several pingtests and concluded everything was fine up to the ISP gateway router I have, but no connection beyond that. I called the ISP, who informed me that they saw some disconnects but too little to make a case. I was asked to note down the time when a failure occurred, do this for a time period, and let them know if the situation became worse.

I suppose most of the readers have had a similar experience with an ISP once. I can’t blame them that they need more data but I would like a more stable connection. And I’m not home all the time to check the connection. But, with a Cisco 3560 switch running 24/7 in the network, I have a dedicated  device to track the internet connection status. For this, I will be using IP SLA. You can find a detailed guide of it on the Cisco website.

For a more detailed view of what is failing, I will be tracking two objects: an ICMP ping to the ISP gateway router at 192.168.0.1, and a DNS query to the ISP DNS server. This way I can tell whether the ISP device or the connection is failing. I opted for a DNS query because sometimes pings still work while websites don’t (I’m not sure why, perhaps related to the small MTU). But before configuring this, I’m going to set the time on my switch, because otherwise all logged failures are useless.

The time can be set locally using the ‘clock set’ command, but it would not be entirely accurate and time would be lost after a switch reboot, so NTP is a better option. After finding a regional NTP server, I configure the commands:

Switch(config)#clock timezone CET -1
Switch(config)#clock summer-time CDT recurring
Switch(config)#ntp server 193.190.230.66

The timezone (Central European Time) is needed because a NTP uses Coordinated Universal Time (UTC). Cisco IOS supports summer-time, so I add that too. A ‘show clock’ reveals the correct time. Now to the configuration of the IP SLA:

Switch(config)#ip sla 20
Switch(config-ip-sla)#icmp-echo IP
Switch(config-ip-sla-echo)#frequency 60
Switch(config-ip-sla-echo)#timeout 100
Switch(config-ip-sla-echo)#exit
Switch(config)#ip sla 30
Switch(config-ip-sla)#dns google.com name-server 8.8.8.8
Switch(config-ip-sla-echo)#frequency 60
Switch(config-ip-sla-echo)#timeout 500

I used my ISP’s DNS, the one used here is Google’s. Frequency is set to 60 seconds, and time-out in milliseconds is added, so the IP SLA can be triggered by a slow connection too. This is the correct configuration but it still needs to be activated to work. Since I want it running all the time, I make it forever:

Switch(config)#ip sla schedule 20 start-time now life forever
Switch(config)#ip sla schedule 30 start-time now life forever

Once it is running, you can check the IP SLA state with ‘show ip sla statistics’. Normally one would set up a SNMP client that will monitor the state of IP SLA and gather alerts, but because that requires a computer to be powered on all the time, I’m going to bind it to the local syslog. Keep in mind that this is not a good practice on a production switch: there’s only a small logging buffer present and information is lost on reboot. To do this, I use a hidden command, not visible with ‘?’:

Switch(config)#track 2 rtr 20
Switch(config-track)#end
Switch(config)#track 3 rtr 30
Switch(config-track)#end

The ‘rtr’ part makes the IP SLA a tracked object. This generates a notification (severity 5 syslog trap) every time an IP SLA changes state. Using the ‘show logging | include TRACK’ command you can see when failures occurred. And now, we wait.

Update September 25, 2011: I didn’t have to wait long, I had a random disconnect yesterday, triggering the DNS check IP SLA. The ping to the modem didn’t fail, so the modem has had no downtime so far.
IP SLA

Advertisements