Tag Archive: QoS


In Europe, the cheapest WAN links start around 2 Mbps these days. While this makes some WAN optimizations covered in Cisco’s QoS guides unnecessary, it’s good to know them and the effects of slower links on traffic.

Serialization delay
Putting a frame on the wire from a switch or router requires time. The amount of time is directly related to the line speed of the link. Note ‘line speed’: the actual negotiated speed on layer 1. For a 1 Gbps interface negotiated to 100 Mbps Full Duplex which is QoS rate-limited at 10 Mbps, the line speed is 100 Mbps. The formula is the following:

SerializationDelay

This means that if you have a 1514 bytes frame (standard MTU of 1500 bytes plus the layer 2 header of 14 bytes) and send it out of a 100 Mbps interface it will take (1,500*8)/(10^8)= 0.12 ms or 121 µs. If a small voice frame arrives in the egress queue of a switch or router it can incur op to 121 µs of additional latency if a 1514 bytes frame is just being sent out. Consequence: even under near perfect conditions and good QoS configuration where voice frames are given absolute priority over the network, there’s a possible jitter per hop. The higher the general bandwidth throughout the network, the lower the jitter, so latency-sensitive traffic does benefit from high bandwidth and fewer hops. Over a 10 GE interface that same frame would be serialized in just 1.21 µs per hop.

There are some consequences for WAN links: at slower speeds, the serialization delay increases rapidly. At 2 Mbps for a 1514 byte frame it’s 6 ms. At 64 kbps, it’s 190 ms. And in case you’re enabling jumbo frames: 9014 bytes over 10 Mbps is 7.2 ms.

Link Fragmentation and interleaving
Generally, at 768 kbps and below, jitter for voice becomes unacceptable. This is where Link Fragmentation and Interleaving (LFI). It works by splitting up large frames into smaller parts and putting low-latency packets between these parts.

LFI

Configuration of LFI on a Cisco router is as following:

Router(config)#interface Multilink1
Router(config-if)#ip address 192.0.2.1 255.255.255.0
Router(config-if)#ppp multilink
Router(config-if)#ppp multilink fragment delay 3
Router(config-if)#ppp multilink interleave
Router(config-if)#ppp multilink group 1
Router(config-if)#exit
Router(config)#interface Serial0/0
Router(config-if)#
Router(config-if)#encapsulation ppp
Router(config-if)#clock rate 768000
Router(config-if)#ppp multilink
Router(config-if)#ppp multilink group 1
Router(config-if)#exit

First, create a Multilink interface. It will serve as an overlay for the actual physical interface, as this is where the LFI will be configured on. The Multilink interface will have all configuration: IP address, service policies,… Except for the layer 1 configuration (notice the clock rate command on the serial interface).

The ‘ppp multilink interleave’ activates LFI. The ‘ppp multilink fragment delay 3’ means LFI will automatically try to split up large packets so no packet has to wait longer than 3 ms while another is serialized. On the serial interface, encapsulation has to be set to ppp first. Next, it becomes possible to associate the interface with a Multilink overlay interface using the ‘ppp multilink group’ command.

The configuration has to be done on both sides of the WAN link, of course. The other side needs to use PPP encapsulation as well, and needs to have LFI enabled to reassemble to split up packets.

This concludes the series of QoS articles on this blog. Up next, I’ll try out different attacks on a Catalyst switch and see how it reacts.

QoS part VII: wireless.

Back to the definition of QoS in part I: a mechanism to determine which packet to process next in case of unavailable resources. While unavailable resources usually referred to bandwidth, and ASIC throughput in case of some 6500 line cards, for wireless the limiting factor is transmit opportunities.

Contention Window
Wireless is a half-duplex medium, and as such uses CSMA/CA to check if the wireless frequency is occupied by another wireless node before transmitting a frame. Before sending, the wireless node waits a random period of time. The random period of time is between a fixed back off time and a Contention Window (CW). The time is expressed in ‘slots’, and I’m unable to find any documentation that explain what a slot is. It might be a microsecond, but nothing I found confirms this.

The CW has a minimum value and a maximum value. Before a frame is transmitted, the wireless device will wait a fixed back off time plus a random extra time between zero and the minimum contention window, CWmin. If transmit fails (due to a collision in the wireless half-duplex medium), the process will be retried, again with fixed back off time, but with a larger contention window. If transmission fails again, the process will continue until a set number of tries is reached, after which the frame is dropped. The CW will increase every time until a fixed maximum value is reached: CWmax.

WifiQoS-1

Using low CW values means the packet has low latency, but the chance for collisions increases. A higher CW means a higher average latency, and more possible jitter. CW is a number between 1 and 10. That number is not the number of slots, but it’s part of the formula 2^x-1 = slots. A CW of 3 is 7 slots (2^3-1). A CW of 5 is 31 (2^5-1). For example, frames transmitted with a fixed back off time of 5, a CWmin of 4 and a CWmax of 6 will have on average 12 slots of latency when there’s little traffic (5 + 15/2), and if traffic increases, it goes up and can reach 20 slots latency on average (5+31/2).

Traffic classes
This is where the wireless QoS comes into play: the 802.11e standard allows traffic to be split up into classes that each receive different back off times and CWmin and CWmax values. Contrary to switches and routers, it’s a fixed number of classes, with a fixed name. Ironically, voice is given a CoS value of 6 in the wireless world, not 5. The table gives these values:

WifiQoS-2

This table is directly from the Wikipedia page about 802.11e.

Configuration
The configuration on an Aironet is the same as other Cisco devices as far as classification goes, but the difference is in the output queueing with the contention window. Let’s assume we want to classify a RTP voice stream, UDP port 16384, as CoS 6, and everything else as CoS 0:

AP1142N(config)#ip access-list extended AL4-RTP
AP1142N(config-ext-nacl)#permit udp any any eq 16384
AP1142N(config-ext-nacl)#exit
AP1142N(config)#class-map CM-RTP
AP1142N(config-cmap)#match access-group name AL4-RTP
AP1142N(config-cmap)#exit
AP1142N(config)#policy-map PM-Wifi
AP1142N(config-pmap)#class CM-RTP
AP1142N(config-pmap-c)#set cos 6
AP1142N(config-pmap-c)#exit
AP1142N(config-pmap)#class class-default
AP1142N(config-pmap-c)#set cos 0
AP1142N(config-pmap-c)#exit
AP1142N(config-pmap)#exit
AP1142N(config)#interface Dot11Radio 0.10
AP1142N(config-subif)#service-policy output PM-Wifi

Note that the policy is applied on a subinterface in the output direction: it’s for sending only, and on a per-SSID basis. This is not the case for the actual CW configuration: it’s on the main interface because it’s how the antenna will work.

AP1142N(config)#interface Dot11Radio 0
AP1142N(config-if)#dot11 qos class best-effort local
AP1142N(config-if-qosclass)#fixed-slot 5
AP1142N(config-if-qosclass)#cw-min 3
AP1142N(config-if-qosclass)#cw-max 5
AP1142N(config-if-qosclass)#exit
AP1142N(config-if)#dot11 qos class voice local
AP1142N(config-if-qosclass)#fixed-slot 1
AP1142N(config-if-qosclass)#cw-min 1
AP1142N(config-if-qosclass)#cw-max 3

The classes have fixed names which can’t be changed. CoS 0 maps to the best-effort class, CoS 6 maps to the voice class. By giving lower values to the voice frames, these will, on average, experience less latency and be transmitted faster. They are also more likely to be transmitted because the back off timer will expire faster. Result: even when the wireless network is dealing with a lot of traffic, voice frames will be transmitted faster and with less jitter.

On to a bigger platform: the 6500 series, Cisco’s flagship Campus LAN switch. Unlike the previously discussed platform, the 6500 series uses the older Weighted Round Robin (WRR) queueing mechanism, and uses CoS internally to put packets in queues.

Queues and thresholds
The capabilities per port also differ per line card and unlike the 3560/3750 series, it uses multiple ASIC per line card.

Switch#show interfaces GigabitEthernet 1/2/1 capabilities | include tx|rx|ASIC
Flowcontrol:                  rx-(off,on,desired),tx-(off,on,desired)
QOS scheduling:           rx-(1q8t), tx-(1p3q8t)
QOS queueing mode:    rx-(cos), tx-(cos)
Ports-in-ASIC (Sub-port ASIC) : 1-24 (1-12)

The above output is of a WS-X6748-GE-TX line card. The ‘1p3q8t’ for egress (tx) means one fixed priority queue and three normal queues, each with eight thresholds. The fixed priority queue cannot be changed to a normal queue: if a packet is in the queue, it will be transmitted next.

QoS9

The ingress ‘1q8t’ means there is one ingress queue with eight thresholds. Unlike the 3560/3750, there is some oversubscription on the line card. It has two ASICs, one per 24 ports (the line card is 48 ports total). Each of these ASICs has a 20 Gbps connection to the 6500 backplane. If all 24 gigabit ports together on that part of the line card start receiving more than 20 Gbps of traffic, the ASIC and backplane connection will not be able to handle all the traffic. Granted, this is a rare event: 24 Gbps maximum throughput on a 20 Gbps capable ASIC is an oversubscription of 1.2 to 1. But in case this happens, the different thresholds can help decide which traffic to drop. However, to drop on ingress, the decision must be made on existing markings.  The ASIC does classification and remarking, and the ingress queue is before the ASIC. This is not a problem usually, since classification and marking is best done at the access layer and 6500s are best used for distribution and core layer.

Switch#show interfaces TengigabitEthernet 1/9/1 capabilities | include tx|rx|ASIC
Flowcontrol:                  rx-(off,on),tx-(off,on)
QOS scheduling:           rx-(1p7q2t), tx-(1p7q4t)
QOS queueing mode:    rx-(cos,dscp), tx-(cos,dscp)
Ports-in-ASIC (Sub-port ASIC) : 1-8 (1-4)

The WS-X6716-10GE, a 10 GE line card, has different queues, especially for ingress. This line card has a high oversubscription of 4:1 and one ASIC per eight ports, for a total of two ASIC for the 16-port line card. This means that, while eight ports can deliver up to 80 Gbps, the ASIC and backplane connection behind it are still just 20 Gbps. The ASIC is much more likely to get saturated, so ingress queueing becomes important here. The fixed priority queue allows some traffic to be handled by the ASIC in low latency, even when saturated.

I’m only going to explain these two line cards, the rest is similar. A full list with details per line card can be found here. The logic is similar to the 3560/3750 platform: configure the buffers and the thresholds, but this time for both ingress and egress. First the ingress queue on the gigabit interface. The ingress queue has no buffer sizing command, as this line card has only one ingress queue.

Switch(config)#interface Gi1/2/1
Switch(config-if)#rcv-queue threshold 1 65 70 75 80 85 90 95 100
Warning: rcv thresholds will not be applied in hardware.
To modify rcv thresholds in hardware, all of the interfaces below
must be put into ‘trust cos’ state:
Gi1/2/1 Gi1/2/2 Gi1/2/3 Gi1/2/4 Gi1/2/5 Gi1/2/6 Gi1/2/7 Gi1/2/8 Gi1/2/9 Gi1/2/10 Gi1/2/11 Gi1/2/12
Switch(config-if)#

That configures the eight thresholds for the first and only queue: threshold 1 at 65%, threshold 2 at 70%, and so on. Note the warning: for ingress queueing, existing cos markings have to be trusted. Also, remember that the 3560/3750 ingress and buffer allocation commands work switch-wide, because it has one ASIC per switch. The X6748 line card on a 6500 has two ASIC, which for QoS are sub-divided in two sub-ASIC per twelve ports. Applying a command that changes the ASIC QoS allocations means that the command will automatically apply to the other twelve interfaces as well.

Next, egress queueing. First configuring buffer allocations, next the thresholds for the first queue, similar to the ingress queue.

Switch(config-if)#wrr-queue queue-limit 70 20 10
Switch(config-if)#wrr-queue threshold 1 65 70 75 80 85 90 95 100
Switch(config-if)#exit

Again something special here: the buffer allocation command ‘wrr-queue queue-limit’ needs only three values despite four queues. This is because queue 4, the priority queue, is a strict priority queue: any packet entering it will be serviced next. This means that if a lot of traffic ends up in the priority queue, it can end up clogging the other queues because these will not be serviced anymore. The only way to counter this is to tightly control what ends up in that queue.

On to the 10 GE line card. First ingress, this time with a buffer command because there are multiple queues on ingress.

Switch(config)#interface Te1/9/1
Switch(config-if)#rcv-queue queue-limit 40 20 20 0 0 0 20
Warning: rcv queue-limit will not be applied in hardware.
To modify rcv queue-limit in hardware, all of the interfaces below
must be put into ‘trust cos’ state:
Te1/9/1 Te1/9/2 Te1/9/3 Te1/9/4
Switch(config-if)#
Switch(config-if)#rcv-queue threshold 2 80 100
Switch(config-if)#rcv-queue threshold 3 90 100
HW-QOS: Rx high threshold 2 is fixed at 100 percent

Propagating threshold configuration to:  Te1/9/1 Te1/9/2 Te1/9/3 Te1/9/4
Warning: rcv thresholds will not be applied in hardware.
To modify rcv thresholds in hardware, all of the interfaces below
must be put into ‘trust cos’ state:
Te1/9/1 Te1/9/2 Te1/9/3 Te1/9/4
Switch(config-if)#end

The 10 GE line card has one ASIC per eight ports, one sub-ASIC for QoS per four ports. It has two thresholds per queue. The rest isn’t any different from previous configurations.

Queue mappings
As mentioned already, internally the 6500 platform uses CoS to determine in which queue a packet (and thus flow) ends up, although some newer line cards can work with both CoS and DSCP. The mappings are again similar to previous configurations:

Switch(config)#interface Gi1/2/1
Switch(config-if)#rcv-queue cos-map 1 4 5
Propagating cos-map configuration to:  Gi1/2/1 Gi1/2/2 Gi1/2/3 Gi1/2/4 Gi1/2/5 Gi1/2/6 Gi1/2/7 Gi1/2/8 Gi1/2/9 Gi1/2/10 Gi1/2/11 Gi1/2/12
Warning: rcv cosmap will not be applied in hardware.
To modify rcv cosmap in hardware, all of the interfaces below
must be put into ‘trust cos’ state:
Gi1/2/1 Gi1/2/2 Gi1/2/3 Gi1/2/4 Gi1/2/5 Gi1/2/6 Gi1/2/7 Gi1/2/8 Gi1/2/9 Gi1/2/10 Gi1/2/11 Gi1/2/12
Switch(config-if)#wrr-queue cos-map 2 3 6
Propagating cos-map configuration to:  Gi1/2/1 Gi1/2/2 Gi1/2/3 Gi1/2/4 Gi1/2/5 Gi1/2/6 Gi1/2/7 Gi1/2/8 Gi1/2/9 Gi1/2/10 Gi1/2/11 Gi1/2/12
Switch(config-if)#priority-queue cos-map 1 1 5
Propagating cos-map configuration to:  Gi1/2/1 Gi1/2/2 Gi1/2/3 Gi1/2/4 Gi1/2/5 Gi1/2/6 Gi1/2/7 Gi1/2/8 Gi1/2/9 Gi1/2/10 Gi1/2/11 Gi1/2/12

The following things are configured here: CoS 5 is mapped to ingress queue 1, threshold 4. Next, CoS 6 is mapped to egress queue 2, threshold 3. And the third command is the mapping of CoS 5 to the first (and only) priority queue, first threshold.

DSCP to CoS mapping
Mapping CoS to a queue is okay, but what if you’re using DSCP for marking? And what if you have access ports on the 6500? CoS is part of the 802.1q header. For this you can do a DSCP to CoS mapping. For example, to map DSCP EF to CoS 5 and DSCP AF41 to CoS 3:

Switch(config)#mls qos map dscp-cos 46 to 5
Switch(config)#mls qos map dscp-cos 34 to 3

Now packets incoming or remarked on ingress as DSCP EF will be treated as CoS 5 in the queueing.

Bandwidth sharing & random-detect.
There are no shaping commands on the 6500 platform, only sharing of bandwidth. Again, only three values are possible for four queues as the priority queue will just take the bandwidth it needs. You can use shared weights using the ‘wrr-queue bandwidth’ command, but it’s easier to add the ‘percent’ keyword and let it total 100 for a more clear configuration:

Switch(config-if)#wrr-queue bandwidth percent 80 10 10

90% for the first queue, 10% for the two others.

The 6500 platform also supports random early detection in hardware, a function borrowed from routers. It can be activated for a non-priority queue, for example the second queue:

Switch(config-if)#wrr-queue random-detect 2

The thresholds for RED can be modified using the ‘wrr-queue random-detect min-threshold’ and ‘wrr-queue random-detect max-threshold’ commands. They configure the thresholds (eight for the gigabit line card) with a minimum value at which RED starts to work, and a maximum value at which RED starts to drop all packets entering the queue.

Show command
So far I haven’t listed a ‘show’ command. This is because everything you need to know about a certain port is all gathered in one command: ‘show queueing interface’. It’s a command with a very long output, showing the queue buffers, thresholds and drops for both ingress and egress.

The DSCP to CoS mapping is switch-wide, so this is still a separate command:

Switch#show mls qos maps dscp-cos
Dscp-cos map:                                  (dscp= d1d2)
d1:d2 0   1   2   3   4   5   6   7   8   9
——————————————————
0 :    00 00 00 00 00 00 00 00 01 01
1 :    01 01 01 01 01 01 02 02 02 02
2 :    02 02 02 02 03 03 03 03 03 03
3 :    03 03 04 04 03 04 04 04 04 04
4 :    05 05 05 05 05 05 05 05 06 06
5 :    06 06 06 06 06 06 07 07 07 07
6 :    07 07 07 07

Again, d1 is the first digit, d2 the second: for DSCP 46, d1 is 4, d2 is 6.

While in part IV a router used software queues, this is not the case on a switch:

Switch(config-if)#service-policy output PM-Optimize
Warning: Assigning a policy map to the output side of an interface not supported

Why not? Because a switch forwards frames with ASICs, not with the CPU. And that means queueing is done in hardware too. And because the hardware contains a fixed number of queues, configuration is not done with a policy-map, but commands to manipulate these queues directly.

There are both ingress and egress queues, but this article will only explain egress queues, as ingress queueing has little relevance on a 3560/3750 platform. Also, I will only talk about DSCP values and ignore CoS, as this platform can use DSCP end-to-end. This article’s intent is to get a basic understanding of QoS on this platform. For a more detailed approach, this document in the Cisco Support community has proven very useful for me.

Queues and thresholds
The number of egress queues can be checked on a per-port basis:

Switch#show interfaces FastEthernet 0/1 capabilities | include tx|rx
Flowcontrol:              rx-(off,on,desired),tx-(none)
QoS scheduling:        rx-(not configurable on per port basis),
.                                tx-(4q3t) (3t: Two configurable values and one fixed.)

Notice the ‘4q3t’ number: this means the port supports for queues, each with three thresholds. Although the value can be checked on a per-port basis, the 3560/3750 series uses one ASIC for all it’s ports, so the number of queues and thresholds is the same on all ports.

QoS6

The four queues are hard-coded: no more, no less. A queue can be left unused, but no extra queues can be allocated. The thresholds are used for tail drops (the dropping of a frame when the queue is full) and allow to differentiate between traffic flows inside a queue.

An example: the third queue has thresholds at 80%, 90% and 100% (The third threshold is always 100% and can’t be changed). You put packets with DSCP value AF31, AF32 and AF33 in the third queue, but on different thresholds: AF31 on 3, AF32 on 2, AF33 on 1. The consequence is that packets with these DSCP values are put into the queue until the queue is 80% full (the first threshold). At that point, frames of DSCP AF33 are dropped, while the other two are still placed in the queue. If the queue reaches 90%, packets with AF32 are dropped as well, so the remaining 10% of the queue can only be filled with AF31-marked packets.

Each queue also has a buffer: the buffer size determines the amount packets a queue can hold. The allocation of these buffers can be checked:

Wolfberry#show mls qos queue-set 1
Queueset: 1
Queue     :       1       2       3       4
———————————————-
buffers   :        25      25      25      25
threshold1:    100     200    100    100
threshold2:    100     200    100    100
reserved  :      50      50      50      50
maximum   :  400     400    400    400

QoS7

A little explanation: everything is percentages here. The ‘buffers’ line indicates how the buffers are allocated: by default 25% for each queue. The exact amount can’t be found in the data sheets and supposedly depends on the exact type of switch.

The ‘reserved’ line means how much of those buffers are actually guaranteed to the queue. By default 50% of 25%, so 12.5% of the buffer pool is actually reserved for one queue. The other 50% of the total buffer pool can be used for any of the four queues that needs it. If all queues are filled and need it, it ends up at 25-25-25-25 again.

The other three lines are relative to the reserved value. The default first threshold of 100% means traffic set in threshold 1 of queue 1 will be dropped as soon as the queue is filled to its reserved value, 50% of the 25% allocated of the pool. The second threshold in queue 2, default 200%, means the queue will fill up to its allocated value, 100% of the 25% of the buffer pool. The maximum is the implicit third threshold, and is the maximum amount of buffer space that queue can use.

These values can all be changed with ‘mls qos queue-set output’ command. For example, let’s allocate more buffers to the second queue, as the intention is to use this for TCP traffic later on. Also let’s change other parameters: the queue will receive 50% of the buffer pool, 60% of this allocation will be reserved, and thresholds will be at 60% (100% of the reserved value), 90% (150% of the reserved value) and 120% (200% of the reserved value). Queue 1 receives 26% of the buffer pool, queue 3 & 4 each 12%. The thresholds for queue 1 will also change to 80% (160% of the reserved value) and 90% (180% of the reserved value) and 100% (200% of the reserved value).

Switch(config)#mls qos queue-set output 1 buffers 26 50 12 12
Switch(config)#mls qos queue-set output 1 threshold 2 100 150 60 200
Switch(config)#mls qos queue-set output 1 threshold 1 160 180 50 200
Switch(config)#exit
Switch#show mls qos queue-set 1
Queueset: 1
Queue     :       1       2       3       4
———————————————-
buffers   :         26      50      12      12
threshold1:     160     100    100    100
threshold2:     180     150    100    100
reserved  :       50      60      50      50
maximum   :   200     200    400    400

QoS8

Queue mappings
Now that the queues have been properly defined, how do you put packets in them? Well, assuming you’ve marked them as explained in part III, all packets have a DSCP marking. The switch automatically put packets with a certain marking into a certain queue according to the DSCP-to-output-queue table:

Switch#show mls qos maps dscp-output-q
Dscp-outputq-threshold map:
d1 :d2    0       1        2         3         4         5         6         7         8         9
————————————————————
0 :    02-01 02-01 02-01 02-01 02-01 02-01 02-01 02-01 02-01 02-01
1 :    02-01 02-01 02-01 02-01 02-01 02-01 03-01 03-01 03-01 03-01
2 :    03-01 03-01 03-01 03-01 03-01 03-01 03-01 03-01 03-01 03-01
3 :    03-01 03-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01
4 :    01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 04-01 04-01
5 :    04-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01
6 :    04-01 04-01 04-01 04-01

Again an explanation: this table explains which DSCP value maps to which queue and threshold. For example, DSCP 0 (first row, first column) maps to queue 2, threshold 1 (02-01). DSCP 46 (fourth row, sixth column) maps to queue 1, threshold 1 (01-01). To map DSCP values to a certain queue and threshold, use the ‘mls qos srr-queue output dscp-map’ command:

Switch(config)#mls qos srr-queue output dscp-map queue 2 threshold 2 34

This will map packets with DSCP value 34 to queue 2, threshold 2, for example.

Bandwidth sharing and shaping
So queues are properly configured, packets are correctly put into the queues according to DSCP values… Just one more thing: what do the queues do? This is where the Shaped Round Robin mechanism comes into play. It’s one of the few egress QoS configurations that is done on a per-port basis on the 3560/3750 platform. There are two commands: ‘srr-queue bandwidth shape’ and ‘srr-queue bandwidth share’, followed by four values for the four queues.

The ‘shape’ command polices: it gives bandwidth to a queue, but at the same time limits that queue to that bandwidth. Ironically it’s an inverted scale: 25, for example, means 1 in 25 packets, or 4%. 5 means 1 in 5 packets, or 20% bandwidth. If a zero is used, that queue is not shaped. The ‘share’ command does not limit bandwidth and gives it in a relative scale: if the total of the queues is 20 and queue 1 has value 5, that’s 25% bandwidth. If the total is 50 and queue 1 has value 5, that’s 10% bandwidth.

Switch(config-if)#srr-queue bandwidth share 10 150 30 20
Switch(config-if)#srr-queue bandwidth shape 20 0 0 0

The above gives 5% bandwidth to the first queue (one in 20 packets). The other three queues receive 75%, 15% and 10% bandwidth respectively. 150+30+20 is 200 (10 is not counted here, because this queue is already shaped). 150 of 200 is 75%, 30 of 200 is 15%, 20 of 200 is 10%. How the shaped and shared queues are counted together is not clear, after all, 105% of bandwidth is allocated now. But it would require all queues to be filled at the same time to reach this situation.

Low latency queuing
And finally, the 3560/3750 allows for one priority egress queue. If a packet is placed in this queue, it will be sent out next, regardless of what is in the other queues, until it reaches the maximum allowed bandwidth. This makes it ideal for voice and other low-latency traffic. By design, the priority queue has to be the first queue, so the command doesn’t have a number in it:

Switch(config-if)#priority-queue out

Sounds logical, right? You’re correct: absolutely not. Unfortunately, Cisco uses a different type of value system for each QoS command, and the only way getting a feeling for it is trying it out. I hope this does help understand the workings of QoS in hardware. In the next article, we’ll review another platform.

Assuming you’ve marked packets on ingress as detailed in part III, it’s now time to continue to the actual prioritization. First a router: a router, e.g. a 2800 platform, forwards packets using the CPU and uses software queues for prioritization. This means packets are stored in RAM while they are queued, and the router configuration defines how many queues are used and which ones are given priority.

QoS3

This queueing in RAM means that you can customize the number of queues. By default, there is only one queue, using the simple First-in First-out (FIFO) method, but if there is needs for different treatment for other traffic classes, new queues can be allocated. The queues can also be given different parameters. While there’s a large array of commands in a policy-map possible, for basic QoS on ethernet, three commands will do: ‘bandwidth’, ‘police’ and ‘priority’.

Bandwidth
The bandwidth parameter defines what amount of bandwidth a queue is guaranteed. It is configured in Kbps. It does not set a limit: if the interface is not congested, the queue will receive all the bandwidth it needs. But in case of congestion, the bandwidth of the queue will not drop below this configured value.

Router(config)#class-map CM-FTP
Router(config-cmap)#match dscp af12
Router(config-cmap)#exit
Router(config)#policy-map PM-Optimize
Router(config-pmap)#class CM-FTP
Router(config-pmap-c)#bandwidth 10000
Router(config-pmap-c)#exit
Router(config-pmap)#exit
Router(config)#interface Eth0/0
Router(config-if)#service-policy output PM-Optimize
I/f FastEthernet0/0 class CM-FTP requested bandwidth 10000 (kbps), available only 7500 (kbps)

The configuration and error message above does show a weak point: you can easily misjudge the amount of bandwidth available. For this, the ‘bandwidth percent’ command makes it easier. Also, while it’s a 10 Mbps interface, it shows only 7.5 Mbps of available bandwidth. The reason for this is that 75% of the interface bandwidth is used for QoS calculations, and the rest is reserved for control traffic (OSPF, CDP,…). The ‘max-reserved bandwidth’ command on the interface can change this, and a modern high speed interface will have enough with a few percent for control traffic.

Router(config)#policy-map PM-Optimize
Router(config-pmap)#class CM-FTP
Router(config-pmap-c)#bandwidth percent 50
Router(config-pmap-c)#exit
Router(config-pmap)#exit
Router(config)#interface Eth0/0
Router(config-if)#max-reserved bandwidth 90

The above would guarantee a bandwidth of 4.5 Mbps for the class CM-FTP: 90% of the 10 Mbps interface is 9 Mbps, and 50% of that.

Police
The bandwidth guarantee for the ‘police’ command is the same as with the ‘bandwidth’ command. The only difference is that it is a maximum at the same time: even if there is no congestion on the link, bandwidth for the queue will still be limited. It is configured in increments of 8000 bits (no Kbps): configuring ‘police 16200’ will actually configure ‘police 16000’. This can be useful: if there is no congestion, available bandwidth is divided evenly over the queues, except the ones that use policing.

Router(config)#class-map CM-Fixed
Router(config-cmap)#match dscp af13
Router(config-cmap)#exit
Router(config)#policy-map PM-Optimize
Router(config-pmap)#class CM-Fixed
Router(config-pmap-c)#police 32000

Priority
The ‘priority’ command is nearly equal to the bandwidth command. Also measured in Kbps, also a minimum guarantee of bandwidth. The difference is that this queue will always be serviced first, resulting in low-latency queueing. Even if packets are dropped due to congestion, the ones going through will have spent the least amount of time in a queue.

Router(config)#class-map CM-Voice
Router(config-cmap)#match dscp ef
Router(config-cmap)#exit
Router(config)#policy-map PM-Optimize
Router(config-pmap)#class CM-Voice
Router(config-pmap-c)#police 32000

TCP optimization
So far, mainly latency-sensitive traffic like UDP voice has been given priority. But it doesn’t mean optimizations for TCP aren’t possible: a protocol such as FTP or any other TCP protocol that uses windowing starts behaving in a typical pattern on a congested link: windowing up until the point of congestion, losing frames and rewindowing to a smaller value, after which the process starts again.

QoS4

If multiple similar TCP connections are on a link, they tend to converge. When congestion occurs, the queue fills up, packets are eventually dropped and many TCP connections rewindow to a lower value at the same time. The consequence is that the link is suddenly only partially used. It would be better if rewindowing for each flow happens at different times, so there are no sudden drops in total bandwidth usage. This can be achieved by using Random Early Detect (RED): by dropping some packets before the queues are full, some flows will rewindow before the link is 100% full, avoiding further problems. RED starts working after the queue has been filled after a certain percentage, and will only drop one in every x number of packets. A complete explanation of RED would take another article, but a simple and effective starting point is the following configuration:

Router(config)#policy-map PM-Optimize
Router(config-pmap)#class CM-TCP
Router(config-pmap-c)#random-detect dscp-based

The ‘dcsp-based’ parameter is optional, but will cause the router to follow the DSCP markings as explained in part II: AF11 has a lower drop probability than AF13, so packets with value AF13 will be dropped more often compared to AF11.

QoS5

The result is more even distribution of bandwidth, and overall better throughput.

One last command can also help TCP: ‘queue-limit’. While the queue length for a priority queue is best set low, TCP traffic is usually tolerant of latency. It’s better to have it in a queue then have it being dropped.

Router(config)#policy-map PM-Optimize
Router(config-pmap)#class CM-TCP
Router(config-pmap-c)#queue-limit 100

A larger queue in combination with RED allows for a good throughput even with congestion. The default queue length is 64.

So that’s the basics for QoS on a router. Up next: different switch platforms, which all have their own different QoS mechanisms.

In part I and part II it should be obvious that markings are a core concept in QoS. While not really mandatory for QoS to function properly, it makes things consistent: a certain type of traffic is marked once, and all devices further in the network can identify a traffic class based on the marking. Without the markings, each network device has to figure out traffic classes on its own.

For this reason, a trust boundary is needed: the imaginary border in the network where traffic classes are identified and marked. The remaining part of the network trusts those markings. But where to place this boundary? Rule of thumb: as close to the source as possible, where you still control the device. Usually this means, on a switchport where an end user connects, the marking is done on the switchport. But the interpretation doesn’t have to be that tight:

  • QoS settings on Windows can be controlled in a Group Policy Object (GPO), so QoS markings can’t be abused by end users. Of course this is best combined with a secure access policy such as 802.1x to prevent connection of any computer not part of the domain.
  • FCoE places a default marking. Best to trust the markings from the server if it’s under your company’s control. If you happen to deploy FCoE in combination with Nexus 2000 FEX’es, you’ll have no choice because the 2000 units can’t do any marking (although this might change in the future, who knows).
  • If you don’t control the network devices, best practice is to not trust it. This means it’s possible you’ll have to do marking on WAN routers or switch uplinks to third parties.

Classification is usually done using an ingress service-policy. It’s possible to do it on egress and place markings as they leave the network device, but you’ll have to do classification on ingress anyway if you want to do QoS on the local device. There are multiple ways to classify a frame or packet, but the most useful by far is the use of extended ACLs. It’s possible classify anything that an extended ACL can match: source, destination IP address or network, source, destination ports, even existing markings such as DSCP markings, and combinations of them.

A common QoS policy is made up of three object types: the ACL matching traffic, the class-map classifying traffic matched by the ACLs, and the service-policy creating a policy based on the class-maps. For example, say that you want to mark voice traffic with DSCP EF, and you’re using a SIP phone with RTP. Session Initiation Protocol is a protocol for voice signalling, and Real-time Transport Protocol is a protocol for voice payload. SIP has a default UDP port of 5060, RTP usually starts at UDP 16384 and sometimes uses higher ports too.

First of all: know your application. We’re going to place markings here so a traffic flow will get priority treatment. Voice payload needs this: low latency, no packet loss. Voice signalling, on the other hand, doesn’t need this: it can tolerate delay. It can’t tolerate packet loss, so it can be given a marking to indicate these packets shouldn’t be dropped. DSCP AF31 is a good one for this. Also important is to be as specific as possible, and know the direction of traffic. This determines the ACL, and by being specific, you avoid other applications from ending up in the same queue as well. A simple ACL will then look like this:

Router(config)#ip access-list extended AL4-RTP
Router(config-ext-nacl)#10 permit udp any any eq 16384
Router(config-ext-nacl)#exit
Router(config)#ip access-list extended AL4-SIP
Router(config-ext-nacl)#10 permit udp any any eq 5060
Router(config-ext-nacl)#exit

Next is the class-map. Usually the class-map and the ACL have a one-to-one mapping, making it superfluous at first sight. But by using a class-map, you can add multiple criteria to define a traffic flow. There are two types of class-maps: the ‘match-any’ that require  a packet to match all criteria, and ‘match-any’ (the default) that just need one of the criteria to match. This allows one class-map to reference both an IPv4 and IPv6 ACL, for example.

Router(config)#class-map match-any Class-RTP
Router(config-cmap)#match access-group name AL4-RTP
Router(config-cmap)#exit
Router(config)#class-map match-any Class-SIP
Router(config-cmap)#match access-group name AL4-SIP
Router(config-cmap)#exit

So the class-map defines a class of traffic. Now everything needs to be put together to classify incoming (or outgoing) traffic and place markings. This is done with the policy-map:

Router(config)#policy-map Mark-Voice
Router(config-pmap)#class Class-RTP
Router(config-pmap-c)#set dscp ef
Router(config-pmap-c)#exit
Router(config-pmap)#class Class-SIP
Router(config-pmap-c)#set dscp af31
Router(config-pmap-c)#exit
Router(config-pmap)#class class-default
Router(config-pmap-c)#set dscp default

It’s self-explanatory for the most part: each class receives a DSCP value. The class-default is everything not defined before, DSCP default means no marking. Note that the command ‘set ip dscp’ also exists, but without the ‘ip’ parameter it applies to both IPv4 and IPv6.

Now that all the objects (ACL, class-map, policy-map) are connected together, it just needs to be applied to an interface:

Router(config)#interface GigabitEthernet0/1
Router(config-if)#service-policy input Mark-Voice
Router(config-if)#exit

And that’s it. Incoming traffic is now being marked. That doesn’t mean anything is happening with it so far: up next, more explanation about mappings and queues.

QoS part II: common traffic classes.

Before I continue to the configuration of QoS, first some short guidelines. Markings determine traffic classes, as explained in part I. It is up to the network device to decide what to do with a certain class of traffic. For consistency, some general guidelines. Note that there is nothing stopping you from using other values than the ones recommended, but using these values will make troubleshooting easier.

Layer 2 – CoS
The CoS field is three bits, so that means there are eight CoS values possible: 0 to 7. By default the CoS marking is zero for most frames. Wikipedia has a page with commonly used traffic classes. The most important ones, depending on the environment, are CoS 5 for voice payload and CoS 3 for FCoE.

Layer 3 – IP Precedence
The IP Precedence field is the first three bits of the DSCP field described below. These days it’s not used anymore by itself but as part of the DSCP field. Nevertheless, it still has a one-to-one mapping to the CoS values. So the guideline is the same as for CoS: IPP 5 is commonly used for Voice. FCoE doesn’t have an IP header, so no IP precedence or DSCP value. By default the marking in this field is also zero, with one notable exception: router generated control packets such as OSPF and HSRP hellos are marked with IPP 6 by default.

Layer 3 – DSCP
Differentiated Services Code Point has six bits, so 64 combinations. It is more complex than IPP and CoS, but the first three bits still mean the same: they indicate a priority for the traffic class. The last three bits are commonly used to mark drop probability. Priority is used for latency mostly: the higher priority a packet is treated, the less time it spends in a queue, and the lower the latency. Drop probability used to differentiate between traffic classes in case a queue is full and packets must be dropped. A full explanation can be found on Wikipedia again, but here are some general guidelines to help understanding the logic:

  • As far as drop probability goes (the last three bits), often only the first two are used in practice.
  • The commonly used DSCP values have a name. Most of these are called Assured Forwarding, with two numbers. The first number indicates the priority, the second drop probability. Because the last bit isn’t used in this naming, AF11 stands for ‘001 01 0’ or DSCP value 10. It is meant to indicate a low priority, low drop probability.
  • Priority in assured forwarding is counted 1 to 4, drop probability 1 to 3. AF43, ‘101 11 0’, means high priority, high drop probability. AF12, ‘001 10 0’ means lower priority, medium drop probability, and so on.
  • Voice traffic is usually given the value Expedited Forwarding (EF), DSCP value 46: ‘101 11 0’. Why this value? It makes sense if you think about it: the first three bits are ‘110’ or 5, which means IPP 5 and maps to CoS 5, the value used for voice on layer 2. The last three bits are ‘110’, or high drop probability. Since voice traffic needs to be as real-time as possible, queueing it is of little use. So in case of a filled queue, which would mean a lot of latency, it is better to drop it in favor of packets that can tolerate more delay.

So what are the AF classes used for? Answer: any application that fits the description. FTP, for example, is very tolerant of latency and can tolerate some packet loss, so AF12 could be a good DSCP value. An interactive application such as Remote Desktop Protocol will perform better in a low latency environment, and preferably less packet loss, so AF41 could be used. Of course, this all depends on the needs.

Also, the QoS mechanisms work by differentiating between traffic flows: if you mark every application in the network with DSCP EF, it is the same as not marking it at all, because everything will be treated the same again.

QoS part I: introduction.

A new series of articles! This time, the challenging topic of Quality of Service.

What is QoS?
QoS is a mechanism in network devices that determines which packet to send next in case of link congestion… No, wait, that’s still not general enough. QoS is a mechanism in network devices that determines which packet to process next in case of unavailable resources. Why the more general definition? Well, while it’s true that QoS used in case of link congestion, it can also be used to determine which packet must be sent to an ASIC or CPU next in case of incoming queueing. But let’s first concentrate on link congestion.

When is QoS applied?
QoS starts doing most action only when there’s actually congestion on a link. If there’s no congestion, and in the QoS default configuration as well, First-in-First-out (FIFO) is applied: the first packet to arrive through a network device at an outgoing (egress) interface will be transmitted first.

QoS1

Easiest example is two incoming flows on 1 Gbps links that both need to go out of a switch on a third 1 Gbps link. As long as both flows combined stay under 1 Gbps, there’s no problem. But as soon as both flows use more than 1 Gbps of bandwidth combined (e.g. two 600 Mbps flows for a total of 1.2 Gbps), the outgoing link will become congested and packets will be dropped.

First of all: QoS doesn’t stop link congestion and will not stop most packets from being dropped. QoS will mostly help determine which packet exactly will be dropped, and preferably this is a packet that is not as critical as one that will still be transmitted.

Second: QoS actually does some things already, even when not experiencing congestion: packets and frames will be marked with values indicating their priority, or how critical they are. Which brings us to the next part:

QoS fields
A frame, on layer 2, and a packet, on layer 3, can be marked with a value. For layer 2 frames, this marking is only possible in the 802.1q header, so untagged frames don’t carry any markings. A VLAN tag has 4 bytes or 32 bits: 16 bits for Ethertype (this way the frame signals it carries a 802.1q header) and 12 bits for the VLAN ID. This means 4 bits remain, of which 3 are used for 802.1p priority signalling or Class of Service (CoS). Three bits means eight classes.

QoS2

On layer 3, the required field is always present for IP traffic. It’s a 6-bit field in the IP header. Originally only three bits were used, it was called IP Precedence and usually it was mapped 1-to-1 to the 802.1p field on layer 2. Extensions where added and the entire 6-bit field became used as a Differentiated Services Code Point (DSCP). There are up to 64 DSCP values possible. The first three bits are the old IP Precedence field and often still mapped 1-to-1 to the CoS value.

QoS field can be marked on end devices themselves by software (e.g. Voice software), or by network devices based on matching criteria, usually ACLs that match port numbers. Having these markings doesn’t do anything by itself: it just differentiates between different classes of traffic. It is up to the network device to decide what to do with a certain class of traffic. A switch can be configured to give packets with a CoS value of 5 priority, however, the next switch can be configured to give CoS 5 only a fixed small amount of bandwidth. The configuration is done per device, and differs per platform (more about that in upcoming articles).

Also, why use CoS if it’s only present in 802.1q headers and can’t traverse any layer 3 hops? Reason: it’s the lowest common denominator, as not all traffic is IP traffic. FCoE, for example, is best placed in its own class, and doesn’t use IP.

Not just for link congestion.
I already mentioned it, QoS is not only for link congestion. Some platforms, most notable the Ciso Nexus series mainly do their QoS on ingress, and place packets in a queue before they are being handled by an internal ASIC. If that ASIC becomes congested, QoS takes place. The egress queues are mainly there in case two different ASICs send a packet to the same physical output queue, resulting in FIFO behavior.

I hope this gave a basic insight. In upcoming articles, I’ll explain how to configure the marking and apply actual prioritization.