I’m going to take a risk here and challenge myself to a discussion that has been active for years among data center network engineers: eliminating Spanning-Tree Protocol from the data center.
To the readers with little data center experience: why would one want to get rid of STP? Well, a data center usually has a lot of layer 2 domains: many VLANs, many servers. That translates to many switches. Those switches require redundant uplinks to the core network, and maybe each other too. Using STP, you can achieve such redundancy, but at the cost of putting links in blocking state. Okay, MST and PVST+ allow you to assign root switches per VLAN or group of VLANs and do some load-balancing across the links, but it can still result in inefficient switching where a frame goes through more switches than it needs to. Port-channels use all links, but they are not perfect and can only be configured between two (logical) devices.
If you’re still not convinced, take the above example. The red and blue line are two connections. While the blue connection does take the shortest path, the red one does not. This is because STP puts some links in blocking state to prevent loops. You can change the root bridge, but choosing a root bridge that is not in the center of the network makes things worse.
Outside of the data center, there is not such a need for improved switching: smaller networks like SOHO or medium size companies don’t have large layer 2 topologies so STP is sufficient, and an ISP, while having a large infrastructure, often uses layer 3 protocols for redundancy because the number of hosts doesn’t matter, only routing data does.
There are several technologies being developed to improve switching and most are based on the proposed TRILL standard: TRansparent Interconnect of Lots of Links. Switches use a link-state protocol that can run on layer 2, IS-IS, which does not carry any IPv4 or IPv6 routes in this implementation, but MAC address locations.
Frames are encapsulated with a TRILL-header (which has it’s own hop count to prevent switch loops) and are sent to the switch that is closest to the destination MAC address. If the entire layer 2 topology runs TRILL, that means it will be sent to the switch that is directly connected to the destination MAC address. The TRILL header is then removed to allow normal transportation of frames again. The entire frame is transported as payload in the TRILL header, even the 802.1q VLAN tag is left unchanged. Basically, it is like ‘routing frames’. For broadcasts and multicasts, a multicast tree is calculated to allow the copying of frames without causing loops. Reverse-Path Forwarding (RPF) checks are also included to further reduce potential switching loops. Also, equal-cost multipath is supported, so load-balancing over multiple links within the same VLAN is possible.
While TRILL is a proposed IETF standard, 802.1aq SPF or Shortest Path Bridging is an IEEE standard. It works very similar and also uses IS-IS with multicast trees for broadcasts and unknown unicasts, as well as RPF checks. The mayor difference is in the encapsulation: either 802.1ah MAC-in-MAC or 802.1ad QinQ is used. MAC-in-MAC encapsulates the frame in another ‘frame’ with source and destination MAC addresses of switches, while QinQ manipulates VLAN tags to define an optimal path. The actual workings are quite complex but can be studied on Wikipedia.
Of course, new protocols in the networking world always come as vendor-specific too, and naturally Cisco is part of it. Cisco uses FabricPath, which is a custom TRILL implementation that can already be configured the Nexus 5500 and 7000 series. It also claims that all switches capable of FabricPath today will be able to support TRILL too once it’s an official standard.
Brocade, originally a SAN vendor with lots of products in the data center network market now, has VCS Fabric Technology, which also is a custom TRILL implementation. While some suggest it does not scale that well compared to other products, it does come with some auto-configuration features that make it easy to deploy.
Juniper also has a vendor-specific protocol to move away from STP: QFabric. It’s not TRILL-based, making it less likely to allow multi-vendor support on existing hardware in the future. A QFabric works like one giant managed switch: it’s like a large chassis made up of multiple physical devices, in some aspects like the Nexus 2000 FEX’es, which act as line cards for a parent 5000 or 7000. Multiple QFabric’s can be connected and appear to each other as one switch, allowing it to scale out very well. For an in-dept explanation, EtherealMind has explained it very well.
Which one is best? Future will tell: all these technologies are relatively new (TRILL isn’t even official yet), and not widely deployed yet. It also depends on what infrastructure you have in mind, the budget, and vendor preference.