Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec




















eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Green SupplyLine
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • RFID-World
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


09 February 2010



Redundancy: Choosing the Right Option for Net Designs

Redundancy is an easy topic to discuss but a challenging concept to implementing networking designs. Here's a look at common redundancy techniques designers can implement in equipment and end-node devices.

By T. Sridhar, FutureSoft
CommsDesign
Jul 27, 2004
Print This Story Send As Email Reprints
 
The word redundancy is thrown around often in today's communications equipment sector. And, for the most part, this term is used to describe systems that deliver the coveted 99.999 percent (five nines) up time.

But, while easy to talk about, redundancy is a complex topic that requires designers to look at all elements of the design process. For example, when building a networking system, designers must implement redundancy techniques in power supplies, switch fabrics, control planes, line cards, and more.

In this article, we'll detail the various types of redundancy in the network and outline the key considerations in each of these redundancy schemes. For simplicity, we will focus on Layer 2 Ethernet networks with IP as the default Layer 3 protocol.

Redundancy Defined
The basic premise of redundancy is the use of additional components to take over for the active component when the active component fails. In a network, redundancy can be specified at two levels: end-node and network-level redundancy.

End-node redundancy involves the use of multiple network adapters to connect to the same network, so that operation can continue even if one of the adapters fails. End-node redundancy is often used in server environments where a loss of connectivity can lead to significant disruption of operations.

Network-level redundancy involves implementing a network with redundant links and network equipment (e.g. routers and switches) as well as the use of protocol techniques like spanning tree protocol (STP) and routing protocol features like equal cost multipath (ECMP).

Network-level redundancy includes link-level and equipment-level redundancy. Link-level redundancy is the use of multiple links between two devices in the network so that when one of the links fails, the other can take over. Equipment-level redundancy involves the use of additional nodes within the network to provide alternate paths between two communicating end nodes.

With both equipment- and link-level redundancy, path switchover takes place when any of the network components (node, link) on the active path fails.

Load Balancing
Below, we're going to take a detailed look at end-node and network-level redundancy. But, before doing that, let's take a quick look at the impact of load balancing.

A key consideration with redundancy is whether the alternate or redundant element (component, link or path) is used during normal operation. Consider a link-level redundancy scenario with two links — Link 1 and Link 2 — connecting two switches. During normal operation, packets may be sent only over Link 1 while Link 2 is in standby mode, waiting to take over if Link 1 fails. This is an inefficient utilization of network resources since Link 2 is idle during normal operation, i.e. only 50% of the link resources are being used.

An alternative is to use both Link 1 and Link 2 during normal operation so that one of the links can take over if the other fails. Also called the load-balancing approach, this scheme has two advantages: the capacity of the connection between the two nodes has been doubled, since both links can be used for traffic between the nodes. The second advantage is that there is no designated primary or backup for redundancy, so the design is simpler.

With node-level redundancy, it is possible to use load balancing by sending the traffic to a different next hop so that it follows an alternate path to the destination. Consider Figure 1 where the destination can be reached via two paths from Router A. One is through Router B and the other through Router C.


Figure 1: Network with multiple paths between routes.

In Figure 1, paths to end node 2 through both Routers B and C exist in Router A's forwarding table. One approach to load balancing: a packet can be sent to Router A if the IP address is odd and through Router B if the address is even.

End-Node Redundancy
Figure 2 shows an end-node attached via two separate links to the same switch. If one of the links or switch port fails, the data can be sent over the other port. To ensure that this happens seamlessly, the most common technique is for the higher-layer software to be completely unaware of load balancing and switchover.


Figure 2: End node connected via two links to a switch.

In Figure 2, assume that the redundancy is implemented without load balancing. Link 1 is the active link while Link 2 is the standby link. Also assume that the media access control (MAC) addresses of the end node on these links are MAC1 and MAC2. Additionally, assume that the node has a "common" link MAC address for all transmissions from the end node to other nodes. This MAC address (CMAC) will be used as the source MAC address for Ethernet frames originated by this end node destined to other end nodes. Thus, this is the only MAC address known to the switches connected to the end node, as well as the operating system and higher layer software.

The advantage of the approach discussed above is that if one of the links fails, the network never needs to learn a new MAC address. The active link or preferred link transfers the packets while the backup link is in the hot standby mode.

Periodically the end node transmits keep-alive MAC frames on Link 1 with source MAC address as MAC1 and destination MAC address as MAC2. When the end node receives this keep alive frame from Link 2 (forwarded by the switch), it knows that Link1 is active (as is the connected port on the switch) and continues to keep Link 2 in standby mode. If it does not receive a specified number of keep alives within a certain period, the end node causes Link 2 to transition to active mode. At this time, it sends a frame with source MAC as CMAC, so that the switch changes its address mapping table entry to indicate that CMAC is reachable via Link 2.

When the end node is connected to two separate switches, traffic can be switched through the backup link to the second switch, if either the link port, link, or switch port fails. The two switches are connected to each other, so if the frames start arriving on Port 2 of Switch B, Switch A will also learn of this connectivity via Switch B, so traffic can continue without disruption. Note that load balancing is not possible if the ports/links are attached to two separate switches.

Network Redundancy
When considering network redundancy, we can evaluate redundancy from both a Layer 2 (Ethernet) and Layer 3 (IP routing and forwarding) perspective. The following sections discuss the various schemes used to implement Layer 2 and 3 network redundancy.

Layer 2: Link Aggregation
If load balancing is used in conjunction with link-level redundancy, it is typically via link aggregation (LA). Figure 3 shows three links that appear as a single link to the switch so that the same MAC address can be used across all three links. A link aggregation process on the end node and the switch provide the control signaling and packet ordering for the links, so that the applications are unaware of the link aggregation. If one of the links fails, the others continue to operate so that the aggregated link appears as a lower speed link now.


Figure 3: Diagram showing a typical link aggregation scheme.

Layer 2: Switch and Link Redundancy
The most common scheme for network-level redundancy in Ethernet switches is by using multiple switches and/or multiple links between switches. When multiple switches operate in parallel, one switch can take over when the primary fails. However, due to the nature of MAC based learning and forwarding, this can cause loops. Thus, we need to disable one of the switches during "normal" operation and let it take over when the active switch fails.

The IEEE 802.1D standard specifies the spanning tree protocol (STP) between switches for implementing redundancy and avoiding loops. STP can also be utilized when multiple redundant links are used between switches. Note that via STP, one of the switches will be in a "standby" state taking over when the other switch fails.

The rapid STP (RSTP) protocol has been specified in the IEEE 802.1w standard as an improvement to IEEE 802.1d STP. This protocol allows faster recovery from failures and is a preferred protocol for newer switches. A key feature of RSTP is that it can "step down" to a standard STP mode of operation with switches that only support STP.

One issue with STP is the lack of load balancing since traffic cannot be sent through the standby switch. However, what if you could send the traffic for one set of virtual LANs (VLANs) through Switch 1 and the traffic for another set of VLANs through Switch 2? In that case, Switch 1 would be the active switch for VLAN set 1 while Switch 2 would be the standby switch. The situation would be reversed for VLAN set 2.

The enhancement to STP to help realize this is specified in the IEEE 802.1s multiple spanning tree protocol (MSTP). It provides for the STP to include VLAN related information. For example, designers could have a network with 10 VLANs of which 3 belong to one spanning tree and 7 to another. This provides a better utilization of redundant bridges than the standard STP and enables a degree of load balancing in the network.

Layer 3: VRRP
End nodes are frequently configured with a default router which they always use to communicate with nodes on other networks. This default router information can be provided via DHCP or static configuration. When end nodes need to communicate with end nodes outside their own network, they send the packets to the default router, which in turn forwards the packets towards the final destination.

One drawback to this approach is that the default router is a single point of failure. Moreover, there is no automatic way for end nodes to move to another router without reconfiguration —a non-trivial task when there are a large number of end nodes in the network.

The virtual router redundancy protocol (VRRP) helps nodes recover from the outage of the default router in a transparent manner. VRRP specifies an election protocol that dynamically assigns the default router responsibility to a specific router (the master) among a group of routers. If the default router fails, one of the other routers in the group takes over with the same default router IP address/ MAC address, so that end nodes see no disruption (Figure 4).


Figure 4: Diagram showing a network implementing VRRP.

The common MAC address used by all the routers in the VRRP group is known as a virtual MAC address. Since, the end nodes have the same IP address and (virtual) MAC address mapping, they will not notice that a backup has taken over from the master. VRRP can also provide load balancing functions, for example the same router that acts as a backup for one group of nodes can be the master for another group.

Layer 3: Router Multipath
Redundancy and resiliency are built into a Layer 3 network topology via the routing protocols. A protocol like RIP or OSPF recovers from a failure in the network (a node or link going down) via a recalculation of routes to destinations that were previously reachable through the failed node or link. When the new routing tables are built up after this recalculation, forwarding of traffic can take place. Thus, packets will resume their flow between source and destination after this recovery.

To avoid disruption of forwarding while the recalculation is taking place, the Layer 3 switch (i.e. router) can use an alternate forwarding path as shown in Figure 1. This is the route through Router C which shows up as an alternate route. The calculation would have been performed earlier by the routing protocol and the information maintained in the routing table as an alternate route.

Most routing protocols do not permit use of this alternate route during normal operation.. An exception is OSPF, which permits equal-cost multipath (ECMP) routing; for example forwarding of packets over multiple paths to the same destination as long as the paths have the same cost.

Wrap Up
Table 1 summarizes the various types of redundancy used in networks along with their key features.

This article focused on implementing redundancy in networks. End-node redundancy is typically implemented on servers while link level and equipment level redundancy are used to build resilient networks. Backup network nodes (e.g. in VRRP environments) and alternate paths (e.g. in RSTP, MSTP and OSPF environments) are some of the building blocks used to build redundancy in networks. Network operators can use the methods described here to ensure reliable end to end communication.

About the Author
T. Sridhar is CTO and vice president of engineering at FutureSoft where his work includes software architecture design for communications systems. He has an MSEE from the University of Texas at Austin and a BE in Electronics and Communications from the College of Engineering, Guindy, Chennai, India. Sridhar can be reached at sridhar@futsoft.com.




EE Times TechCareers
Search Jobs

Enter Keyword(s):


Function:


State:
  

Post Your Resume
-----------------
Employers Area
Most Recent Posts
Ascension Health seeking Solutions Development Analyst in St. Louis, MO

National Semiconductor seeking Principal IC Design Engineer in Santa Clara, CA

Taylor Guitars seeking Sr. Web Designer in El Cajon, CA

Covidien seeking Hardware Manager in Boulder, CO

Sierra Nevada seeking Software Engineer in Hagerstown, MD

More career-related news, resources and job postings for technology professionals



Home  |  Register  |  About  |  Feedback  |  Contact   |  Site Map
All materials on this site Copyright © 2010 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement ¦ Terms of Service