Tag Archive: Data Center


I have to admit, this article will sound a bit like an advertisement. But given that Cisco has gotten enough attention on this blog already, it can only bring variation into the mix.

A short explanation of a series of different products offered by F5 Networks. Why? If you’re a returning reader to this blog and work in the network industry, chances are you’ll either have encountered one of these appliances already, or could use them (or another vendor’s equivalent of course).

F5-LTM

LTM
The Local Traffic Manager’s main function is load balancing. This means it can divide incoming connections over multiple servers.
Why you would want this:
A typical web server will scale up to a few hundred or thousand connections, depending on the hardware and services it is running and presenting. But there may be more connections needed than one server can handle. Load balancing allows for scalability.
Some extra goodies that come with it:

  • Load balancing method: of course you can choose how to divide the connections. Simply round-robin, weighted in favor of a better server that can handle more, always to the server with the least connections,…
  • SSL Offloading: the LTM can provide the encryption for HTTPS websites and forward the connections in plain HTTP to the web servers, so they don’t have to consume CPU time for encryption.
  • OneConnect: instead of simply forwarding each connection to the servers in the load balancing pool, the LTM can set up a TCP connection with each server and reuse it for every incoming connection, e.g. a new HTTP GET for each external connection over the same inbound connection. Just like SSL Offloading, it consumes fewer resources on the servers. (Not every website handles this well.)
  • Port translation: not really NAT but you can configure the LTM for listening on port 80 HTTP or 443 HTTPS while the servers have their webpage running on different ports.
  • Health checks: if one of the servers in the pool fails, the LTM can detect this and stop sending connections to the server. The service or website will stay up, it will just be able to accept fewer resources. You can even upgrade servers one by one without downtime for the website (but make sure to plan this properly).
  • IPv6 to IPv4 translation: your web servers and entire network does not have to be IPv6 capable. Just the network up to the LTM has to be.

F5-ASM

ASM
The Application Security Manager can be placed in front of servers (one server per external IP address) and functions as an IPS.
Why you would want this:
If you have a server reachable from the internet, it is vulnerable to attack. Simple as that. Even internal services can be attacked.
Some extra goodies that come with it:

  • SSL Offloading: the ASM can provide the encryption for HTTPS websites just like the LTM. The benefit here is that you can check for attack vectors inside the encrypted session.
  • Automated requests recognition: scanning tools can be recognized and prevented access to the website or service.
  • Geolocation blocks: it’s possible to block out entire countries with automatic lists of IP ranges. This way you can offer the service only where you want it, or stop certain untrusted regions from connecting.

GTM
The Global Traffic Manager is a DNS forwarding service that can handle many requests at the same time with some custom features.
Why you would want this:
This one isn’t useful if the service you’re offering isn’t spread out over multiple data centers in geographically different regions. If it is, it will help redirect traffic to the nearest data center and provide some DDoS resistance too.
Some extra goodies that come with it:

  • DNSSec: secured DNS support which prevents spoofing.
  • Location-based DNS: by matching the DNS request with a list of geographical IP allocations, the DNS reply will contain an A record (or AAAA record) that points to the nearest data center.
  • Caching: the GTM also caches DNS requests to respond faster.
  • DDoS proof: automated DNS floods are detected and prevented.

F5-APM

APM
The Access Policy Manager is a device that provides SSLVPN services.
Why you would want this:
The APM will connect remote devices with encryption to the corporate network with a lot of security options.
Some extra goodies that come with it:

  • SSLVPN: no technical knowledge required for the remote user and works over SSL (TCP 443) so there’s a low chance of firewalls blocking it.
  • SSO: Single Sign On support. Log on to the VPN and credentials for other services (e.g. Remote Desktop) are automatically supplied.
  • AAA: lots of different authentication options, local, Radius, third-party,…
  • Application publishing: instead of opening a tunnel, the APM can publish applications after the login page (e.g. Remote Desktop, Citrix) that open directly.

So what benefit would you have from knowing this? More than you think: many times when a network or service is designed, no attention is given to these components. Yet they can help scale out a service without resorting to complex solutions.

Advertisements

I’m going to take a risk here and challenge myself to a discussion that has been active for years among data center network engineers: eliminating Spanning-Tree Protocol from the data center.

To the readers with little data center experience: why would one want to get rid of STP? Well, a data center usually has a lot of layer 2 domains: many VLANs, many servers. That translates to many switches. Those switches require redundant uplinks to the core network, and maybe each other too. Using STP, you can achieve such redundancy, but at the cost of putting links in blocking state. Okay, MST and PVST+ allow you to assign root switches per VLAN or group of VLANs and do some load-balancing across the links, but it can still result in inefficient switching where a frame goes through more switches than it needs to. Port-channels use all links, but they are not perfect and can only be configured between two (logical) devices.

STP Network

If you’re still not convinced, take the above example. The red and blue line are two connections. While the blue connection does take the shortest path, the red one does not. This is because STP puts some links in blocking state to prevent loops. You can change the root bridge, but choosing a root bridge that is not in the center of the network makes things worse.

Outside of the data center, there is not such a need for improved switching: smaller networks like SOHO or medium size companies don’t have large layer 2 topologies so STP is sufficient, and an ISP, while having a large infrastructure, often uses layer 3 protocols for redundancy because the number of hosts doesn’t matter, only routing data does.

There are several technologies being developed to improve switching and most are based on the proposed TRILL standard: TRansparent Interconnect of Lots of Links. Switches use a link-state protocol that can run on layer 2, IS-IS, which does not carry any IPv4 or IPv6 routes in this implementation, but MAC address locations.

TRILL header

Frames are encapsulated with a TRILL-header (which has it’s own hop count to prevent switch loops) and are sent to the switch that is closest to the destination MAC address. If the entire layer 2 topology runs TRILL, that means it will be sent to the switch that is directly connected to the destination MAC address. The TRILL header is then removed to allow normal transportation of frames again. The entire frame is transported as payload in the TRILL header, even the 802.1q VLAN tag is left unchanged. Basically, it is like ‘routing frames’. For broadcasts and multicasts, a multicast tree is calculated to allow the copying of frames without causing loops. Reverse-Path Forwarding (RPF) checks are also included to further reduce potential switching loops. Also, equal-cost multipath is supported, so load-balancing over multiple links within the same VLAN is possible.

While TRILL is a proposed IETF standard, 802.1aq SPF or Shortest Path Bridging is an IEEE standard. It works very similar and also uses IS-IS with multicast trees for broadcasts and unknown unicasts, as well as RPF checks. The mayor difference is in the encapsulation: either 802.1ah MAC-in-MAC or 802.1ad QinQ is used. MAC-in-MAC encapsulates the frame in another ‘frame’ with source and destination MAC addresses of switches, while QinQ manipulates VLAN tags to define an optimal path. The actual workings are quite complex but can be studied on Wikipedia.

Of course, new protocols in the networking world always come as vendor-specific too, and naturally Cisco is part of it. Cisco uses FabricPath, which is a custom TRILL implementation that can already be configured the Nexus 5500 and 7000 series. It also claims that all switches capable of FabricPath today will be able to support TRILL too once it’s an official standard.

Brocade, originally a SAN vendor with lots of products in the data center network market now, has VCS Fabric Technology, which also is a custom TRILL implementation. While some suggest it does not scale that well compared to other products, it does come with some auto-configuration features that make it easy to deploy.

Juniper also has a vendor-specific protocol to move away from STP: QFabric. It’s not TRILL-based, making it less likely to allow multi-vendor support on existing hardware in the future. A QFabric works like one giant managed switch: it’s like a large chassis made up of multiple physical devices, in some aspects like the Nexus 2000 FEX’es, which act as line cards for a parent 5000 or 7000. Multiple QFabric’s can be connected and appear to each other as one switch, allowing it to scale out very well. For an in-dept explanation, EtherealMind has explained it very well.

Which one is best? Future will tell: all these technologies are relatively new (TRILL isn’t even official yet), and not widely deployed yet. It also depends on what infrastructure you have in mind, the budget, and vendor preference.

I’ve written about VXLAN before: it’s a proposed technology to tunnel frames over an existing IP network, allowing for much more than the 4096 VLAN limit. When writing that article, an RFC draft was proposed, which expires this month.

Coincidentally or not, Cisco has just released some new switching products, among which a new version of the Nexus 1000V, which claims to support VXLAN. Given the recent release of IBM’s 5000V virtual switch for VMware products, we’re seeing a lot of innovation done in this market segment lately, and it will surely not be the last. As I have yet to test a NX1000V, I’m unsure what VXLAN support means in real life, how it will impact network topologies, and what issues may arise. Two things stand out very clear to me: VXLAN (or any other tunneling over IP) introduces an extra layer of complexity in the network, but at the same time, it allows you to be more flexible with existing layer 2 and layer 3 boundaries as VXLAN does not require any virtual machines to be in the same (physical) VLAN for broadcast-related things, like vMotion for example.

I do have doubts that at this point in time there is a lot of interest towards these products. vSphere and competitors are delivered with a vSwitch present, so it’s less likely to be invested in: ‘There already is a switch, why place a new one?’. But the market is maturing and eventually, vSwitch functionality will become important for any data center.

Also, last but not least, special thanks to Ivan Pepelnjak and Scott Lowe. They both have excellent blogs with plenty of data center related topics, and I often read new technologies first on their blogs before anything else.

If you’ve had little or no real-world experience inside a data center or large switched infrastructure, the Cisco Nexus series of switches is something you probably haven’t encountered so far. Yet, they are rather different from ‘standard’ Cisco Catalyst switches like the 3560/2960/3750 series switches which are most commonly used these days in certification training and most business environments. Since I’ve been able to get my hands on them, I’ll share my experiences to the reader. I’ll be focusing on the 5000 and 2000 series, as these show a clear design difference with the Catalyst series.

Nexus

A Nexus 2000 is also called a fabric extender, or FEX. The idea is that they extend the switching fabric of a Nexus 5000 or 7000 (the 7000 is a chassis). A FEX has no management interface, but instead has to be connected to a Nexus 5000 or 7000, after which it becomes a logical part of that parent switch. A 32-port Nexus 5000 with ten 48-port Nexus 2000 attached will list a whopping 512 ports under ‘show ip interface brief’, not counting any VLAN interfaces. All interfaces will show as ‘ethernet’, no matter their link speed, so no guessing ‘was it f0/1 or g0/1’ here.

Connection from FEX to parent switch is done via a SFP module with fiber, or a Cisco twinax cable, which is an ethernet-like copper cable with the SFP already attached to it on both sides. Depending on the FEX model, there are two or four SFP uplinks possible, just like most Catalyst switches.

Twinax

The 5000 series has 32 to 96 1/10 Gbps SFP ports. These ports cannot negotiate any lower than 1Gbps, so 10 or 100 Mbps is not an option. As the parent switch, it is supposed to provide uplinks to other parts of the network, or uplinks to the FEX’s, so high bandwidth is needed. The actual links to the servers are meant to be patched on the FEX’s, which have 24 to 48 100/1000 Mbps ports. 10 Mbps is not possible here. (Frankly, who still uses that?)

An interesting feature is that you can use two 5000 or 7000 together as one logical switch when setting up port aggregation, as long as they have a direct connection between themselves for control. So by using an uplink to another switch or FEX on one Nexus, and using a second uplink on the second Nexus, you can create an Etherchannel, without any of the links getting blocked by STP and without causing a loop. The link between the two Nexus switches will keep information synchronized. This is called a virtual Port Channel or vPC.

Also, they don’t run the classic Cisco IOS, but use NX-OS. While this originally evolved from a different line of operating software, the basic commands are the same as in IOS. Some things are somewhat different, e.g. SPAN or port mirroring requires additional commands. Just for reminder, a SPAN port is configured on a Catalyst switch like this:

switch(config)#monitor session 1 source interface g0/4
switch(config)#monitor session 1 destination interface g0/5

The above will copy all traffic from interface g0/4 to g0/5. If you connect a capturing device on port g0/5 (e.g. a computer with Wireshark running), you can see the traffic. A Nexus works different:

switch(config)#monitor session 1 source interface e111/1/20
switch(config)#monitor session 1 destination interface e1/5
switch(config)#interface e1/5
switch(config-if)#switchport monitor
switch(config-if)#exit
switch(config)#no monitor session 1 shut

By explicitly configuring the switchport as a monitoring interface, there’s less confusion: in the Catalyst series the monitoring switchport can have an entirely different configuration, but it won’t take effect as soon as it becomes a SPAN destination. The monitor session doesn’t start by default, hence the last command. Since you’re working in a multiple gigabits environment, this is an understandable choice.

Using NX-OS has another reason, of course. The Nexus series can run FCoE natively. For more information, read this first. By combining this with servers that have converged network adapters (CNAs) and connecting the Nexus to a SAN, it’s possible to run both storage and IP-based communication through the same physical network.

These are the main reasons Cisco is having success with these lines of switches: they’re very redundant (vPC, dual power supplies, dual fans,…), they provide both LAN and SAN functionality, and have high throughput rates (1/10 Gbps, sub-millisecond switching from server through FEX to parent switch). They are mostly used in an environment that needs large layer 2 domains, like data centers. I’ve also heard of implementations for an access layer design towards many end users, which would work and provide great redundancy, but since these switches weren’t designed with that in mind, they lack PoE capabilities often needed for IP Phones and access points.

Ethernet (and TCP/IP) is the number one used technology these days for communication between devices. But for storage, the dominant technology in a datacenter often is Fibre Channel (abbreviated to FC).

Fibre Channel
FC is a network standard to allow hosts (servers) to communicate with storage devices. By itself, it’s completely separate from Ethernet. A storage network switch is not the same as an Ethernet network switch. There is one notable exception to this rule: the Cisco Nexus 5548UP and 5596UP have switchports that can be run in either Ethernet mode, or Fibre Channel mode, but not both modes at the same time. There’s also no communication between both types of ports possible, as the protocols are incompatible.

One name you’ll hear when talking about storage networking is Brocade: the most prominent vendor of storage networking hardware. Also, a bit of information about the name Fibre Channel: originally, FC’s only transport medium was fiber, but these days twisted pair copper wire is also possible. That’s the opposite of Ethernet, which originally ran only on copper wires and now can be used on fiber as well.

Reliability
Another important difference to remember between Ethernet and FC is the reliability: FC was designed with perfect reliability in mind. Not a single frame may be lost, and frames must be delivered in order, just like they would from a local attached storage device. FC switches even signal when they’re congesting to other devices, so these devices stop sending frames, instead of dropping frames. This in contrast to Ethernet, which will just start dropping frames when congested, relying on upper layers (like TCP) to make sure everything keeps working.

SAN versus NAS
Some people think a Storage Area network, or SAN, is similar to a Network Attached Storage disk, or NAS. This is not true: a NAS provides access to files, a SAN provides access to raw storage. It also doesn’t show up as a network drive in the operating system but as a local attached drive, and it is treated that way too.

Layers and command set
Wikipedia mentions that Fibre Channel does not follow the OSI layer. It’s true but not completely:  a FC frame be divided into layers. The biggest difference is layers 5 to 7 of the OSI model are missing, as FC is raw storage data transport and not related to a particular application. I’ll quote the layers from Wikipedia:

  • FC4: Protocol Mapping layer for protocols such as SCSI.
  • FC3: Common Services layer, a thin layer for encryption or RAID redundancy algorithms.
  • FC2: Network layer, consists of the core of Fibre Channel, and defines the main protocols.
  • FC1: Data Link layer, which implements line coding of signals.
  • FC0: PHY, includes cabling, connectors etc.

On FC4, SCSI or Small Computer System Interface is commonly used. SCSI is a command set to communicate with storage devices. It’s the same command set used between a computer and a local attached SCSI drive (like a SAS drive). FC2 is the network layer and somewhat relates to OSI layer 2 and 3. A SAN is one flat network, best compared to a layer 2 subnet. There are discussions about whether FC is switching or routing, but it’s a bit of both really. Personally, I use the term ‘Fibre Channel switching’ because it’s a flat network. On the other hand, FSPF or Fibre Channel Shortest Path First, is commonly referred to as a routing protocol. Also, it doesn’t use MAC addresses, but World Wide Names (WWNs) to identify source and destination nodes, which are hexadecimal numbers just like MACs.

Bandwidth
FC speeds aren’t in multiples of 10 like Ethernet, but double with each implementation: there’s 1GFC, 2GFC, 4GFC, 8GFC and 16GFC. The ‘G’ stands for Gigabit, as you need high bandwidth for storage. A FC adapter is not like an Ethernet NIC: it doesn’t have an IP, and it will not be treated as a NIC by the operating system, but more like a storage device (which it is).

Fibre Channel over Ethernet
When data centers started to grow, this gave some scalability options when implementing redundancy. Redundancy meant two Ethernet NICs, but also two FC adapters for storage, giving a total of four connections per server. For this reason, Fibre Channel over Ethernet (or FCoE) was developed. FCoE uses Ethernet frames (up to OSI layer 2) and sets FC on top of that (from FC2 and up). The result is a converged network that can transport both device communications and storage blocks.

For this to work you’ll need a Converged Network Adapter (CNA) and switches capable of FCoE. It’s theoretically possible to use a normal NIC and let software calculate the FCoE frames, but few, if any, of these implementations exists. Also, I haven’t found any sources claiming a standard Ethernet switch will or will not work. Most likely they’ll work, but given the unreliable nature of Ethernet, you’ll run into serious problems once congestion occurs, as SCSI does not recover well from lost or out-of-order-delivered frames (most likely your operating system will crash or get corrupted). A FCoE enabled switch, like the Cisco Nexus series for example, provides lossless Ethernet techniques to handle this, and can use FC signalling to prevent congestion.

Fibre Channel over IP
So that’s FCoE, but as this doesn’t use IP, it’s still a flat network. For WAN links, there are other standards too, that can span multiple hops and don’t have distance limitations like native FC. It’s possible to run FC on top of IP, using FCIP or iFCP (Internet Fibre Channel Protocol). Both don’t seem to be commonly used.

iSCSI
One of the more widely used techniques for converged storage networking is iSCSI, which is running SCSI on top of TCP (using ports 860 and 3260). This doesn’t really involve any FC formatting anymore in any part of the frame, so it’s less overhead than FCIP and iFCP, which also run on TCP but then still require FC headers. TCP counters the unreliability of Ethernet, allowing for reliable frame delivery and sequence numbering to prevent out-of-order-delivery. iSCSI also doesn’t require specialized networking gear, allowing for normal Ethernet network equipment. You can even implement QoS and basic firewalling matching on TCP port numbers.

Storage space
SCSI uses Logical Unit Numbers (or LUNs) to differentiate between different (virtual) partitions on a storage device. This means that you can have a large SAN server with several TB of storage, divided into many different LUNs, one for each server. Servers then communicate using SCSI (over any of the above technologies) using LUNs to addresss their part of the storage. This way, servers do not interfere with each other’s storage. Most modern operating systems have support for iSCSI. VMWare’s ESXi and vSphere even implement this on the hypervisor level, making the storage disks appear completely local to the virtual machines.

IP over Fibre Channel
Internet Protocol over Fibre Channel (IPFC) exists too, but it doesn’t seem to be used a lot. Good documentation and drivers are hard to find, so why go through all the trouble? Most companies already have a working Ethernet infrastructure and Ethernet is usually less expensive. This is also another reason why iSCSI is popular: some claim that buying 10 Gigabit-Ethernet switches and NICs is less expensive than buying 8GFC switches and adapters, and the increased overhead of TCP and iSCSI is less than the speed gain from 8 Gbps to 10 Gbps.

This was quite a write, but I hope to have cleared out the basic differences and similarities between these two technologies. Anything to add? Let me know in the comments.

Virtual switching plays an important role in the data center, so I’m going to give a brief overview of the different products. What is virtual switching? Well, a physical server these days usually has a hypervisor as operating system, which has only one function: virtualizing other operating systems to virtual machines that are running on top of the hypervisor. These virtual machines can be Windows, Linux, Solaris, or even other operating systems. These virtual machines need network connectivity. For that, they share one or more physical network interface cards on the server, commonly called a pNIC. To regulate this network traffic, a virtual switch, called a vSwitch, runs in software on the hypervisor and connects these pNICs with the virtual network interface cards of the virtual machines, called vNICs. So it looks like this:

Virtual Network

The blue parts are done in software, only the last part, the pNIC, is physical.

There are three big players in the hypervisor market: Citrix with XenServer, Microsoft with Hyper-V and VMware with ESXi or vSphere. Each has their own implementation of a virtual switch.
Apart from that, Cisco has a Nexus 1000 virtual switch.

Citrix Xenserver
I have no experience with XenServer and so far I’ve found litte information on it. A virtual switch that can be used is Open vSwitch, an open source product which runs on Xen and Virtualbox. I’m not sure if this is the only virtual switch that XenServer supports. Open vSwitch supports a variety of features you would expect from a switch: trunking, 802.1q VLAN tags, link aggregation (LACP), tunneling protocols, SwitchPort ANalyser (SPAN), IPv6, basic QoS. I could not find anything in regard to Spanning Tree Protocol support, so I’m uncertain what will happen if a loop is created to a server with multiple pNICs and no link aggregation configured.

Microsoft’s Hyper-V
Again, I have little real world experience with Hyper-V, and details are not clear, but the virtual switch supports the mandatory 802.1q VLAN tags and trunking. Advanced spanning-tree support is missing as far as I can tell, you can’t manipulate it. I’ve found no information on link aggregation support. It’s a very simple switch compared to the other products. There’s one advantage though: you can run the Routing and Remote Access role on the Windows Server and do layer 3 routing for the VMs, which offers some possibilities for NAT and separate subnets without the need of a separate router. It’s a shame Microsoft decided to no longer support OSPF on their Windows Server 2008, as this might have been a great addition to it, making a vRouter possible. RIPv2 should still work.

VMware’s ESXi and vSphere
The vSwitch developed by VMware is, in my opinion, very good for basic deployment. It supports 802.1q VLAN tags and trunking. It does not support spanning-tree but incoming spanning-tree frames are discarded instead of forwarded. Any frames entering through the pNICs that have the source MAC of one of the virtual machines are dropped. Broadcasts are sent out through only one pNIC. These mechanisms prevent loops from forming in the network. Link aggregation is present but only a static EtherChannel can be formed, which requires some additional planning. QoS is not supported, and no layer 3 functions either.

Nexus 1000 virtual switch
I’m adding the NX1000V to this list, as it is currently one of the few products on the market that can be used as a vSwitch instead of the default hypervisor vSwitch. Currently there’s only support for vSphere, but Cisco announced that there will be support for the Windows Server 8, too.
The NX1000V is supposed to support anything that’s possible with a physical Nexus switch. So compared to the default vSwitch used, it will add support for LACP, QoS, Private VLANs, access control lists, SNMP, SPAN, and so on.

With the ongoing virtualisation of data centers, virtual switching is an emerging market. For those of you interested in it, it’s worth looking into.

VLAN limit and the VXLAN proposal.

Today I stumbled across a nice RFC draft which proposes a new kind of network topology in data centers (thanks to Omar Sultan for the link on his blog). It’s four days old (at the time of writing) and is proposed by some mayor players in the data center market: it mentions Cisco, Red Hat, Citrix and VMware among others.

It proposes the use of VXLANs, or Virtual eXtensible Local Area Networks, which is basically a tunneling method to transport frames over an existing Layer 3 network. Personally, after reading through it, the first thing that came to mind was that this was another way to solve the large layer 2 domain problem that exists in data centers, in direct competition with TRILL, Cisco’s FabricPath, Juniper’s QFabric, and some other (mostly immature) protocols.

But then I realised it is so much more than that. It comes with 24 identifier bits instead of the 12 bits used with VLANs: an upgrade from 4,096 VLANs to 16.7 million VXLANs. Aside from this it also solves another problem: switch CAM tables would no longer need to keep track of all virtual MAC addresses used by VMs, but only the endpoints, which at first sight seem to be the physical servers only (I don’t think this is a big problem already. The draft claims ‘hundreds of VMs on a physical server’, which I find hard to believe, but with the increase of RAM and cores on servers this may become reality soon in every average data center). It also seems to have efficient mechanisms proposed for Layer 2 to Layer 3 address mapping and multicast traffic. Since it creates a Layer 2 tunnel, it would allow for different Layer 3 protocols as well.

Yet I still see some unsolved problems. What about QoS? Different VMs may need different QoS classifications. I also noticed the use of UDP, which I understand because this does not have the overhead of TCP, but I don’t feel comfortable sending important data on a best-effort basis. There is also no explanation what impact it will have on link MTU, though this is only a minor issue.

In any way, it’s an interesting draft, and time will tell…