Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec




















eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Green SupplyLine
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • RFID-World
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


14 March 2010



Build High Availability into Your IP Network: Part 2

This, the second of a two-part series, will examine the software design factors that are critical to achieving the much-touted "five-nines" availability of IP networks.

By Purnam A. Sheth
CommsDesign
Jan 03, 2003
Print This Story Send As Email Reprints
 
To view a PDF version of this article, click here.

Demand for high-availability Internet Protocol routers is being spurred by the need to reinforce the resilience of wide-area network edge aggregation devices, which usually serve large volumes of users and therefore can have a far-reaching impact on network service levels. Users require very high uptime for their mission-critical traffic, as well as for emerging real-time applications, such as voice-over-IP, now joining the network.

Systems designers, therefore, are challenged to deliver system uptime equivalent to, or better than, the reliability of the public switched telephone network (PSTN). This requirement for "five-nines" availability-a system that is processing and forwarding packets at least 99.999 percent of the time-is more pressing at the network edge, where redundant user access links are scarce. In the core, redundant systems connected by meshed links can leverage IP's inherent automatic rerouting capabilities in the event of a failure. This is a critical difference between IP and the circuit-switched PSTN.

Part 1 of this article explored several of the design aspects important to improving router availability, or the percentage of time a router is actually processing and forwarding packets. As noted, a high-availability design requires balancing variables such as system cost, complexity and network service-level goals. Now, in Part 2, we will take a closer look at the software design factors in the high-availability equation.

A highly redundant hardware platform coupled with fast software-recovery techniques boosts the availability of an IP router. As such, system availability encompasses two primary measurements: 1) the router's mean time between failures (MTBF), a measurement of the total time the device is in operation; and 2) the router's mean time to recover (MTTR) from an outage. This is the amount of time the system is not processing and forwarding packets. MTBF divided by the sum of MTBF and MTTR times 100 percent delivers the percentage availability of a given system.

Achieving 99.999 percent uptime or higher is quickly becoming a router design goal. A basic requirement for any highly available IP router is a design that includes completely redundant hardware components. Redundancy allows a router that experiences a failure to switch over to a backup component and recover. In addition to a redundant hardware platform, a software architecture that provides fast or seamless recovery when a switchover occurs is required for lowering MTTR.

Recovery Requirements
To reduce MTTR, router software must be able to either pass some amount of router configuration and link-state information to a standby component or recover from another source. The following types of information must be available on the standby to facilitate graceful system recovery:

  • Static router-configuration information
  • Link-state information
  • Protocol-specific information
  • Routing database (or the system must be able to reconstruct the routing database using information from peer routers)
  • Other dynamic system information (for example, Simple Network Management Protocol sysUptime must be a monotonically increasing value, even after a switchover)

Several key characteristics of system behavior must also be preserved to minimize the impact of a route processor (RP) failure:

  • Physical-layer connections must continue to operate independently of the RP.
  • Layer 2 connections must not time out during the recovery process of the now-active RP.
  • Layer 3 routing protocols must not time out and cause routing flaps.
  • The software must continue to forward packets using the last known forwarding table on the line cards.

To help meet the last two goals, the Internet Engineering Task Force (IETF) has built extensions to routing protocols such as the Border Gateway Protocol (BGP) that minimize the duration and reach of an outage associated with a failed RP (see the "Protocol extensions" section below). BGP is a particularly strong candidate for these high-availability protocol extensions, because it is deployed at the network edge.

The edge is the network segment that benefits most from highly available systems, because it is where many access circuits terminate and traffic is aggregated for forwarding across the wide-area network (WAN). In addition, for cost reasons, consumers and small offices commonly run a single circuit to the edge of a service provider network, leaving no alternative path for routing around a failed device.

Data Syncing Between RPs
There are several options for checkpointing, or synchronizing, large volumes of information between active and standby RPs. The design choice, again, requires balancing uptime requirements with cost, complexity, processing load and other considerations. Let's look at three of these synchronization options: message replay, full synchronization and partial synchronization.

Message replay. In this instance, every message or event generated by the active RP is replayed on the standby. The advantage of this type of synchronization is that it allows for deterministic state creation and propagation to the standby, in that the code on the standby is effectively running the same (or nearly the same) software as the active RP. However, it is difficult to maintain and guarantee synchronization in a system using message replay. The code path that caused the active RP to fail will also cause the standby to crash, because the standby will follow the same code path.

Full synchronization. Here, every piece of data-including megabytes of routing database information-is synchronized to the standby RP. Having a complete database of full-state information available on the standby avoids "black holes" in forwarding during switchover. In this way, full synchronization speeds system recovery and boosts availability.

In the minus column, high data and transaction rates are required for full synchronization. Particular protocols such as Transmission Control Protocol (TCP) with its unique sequence numbers must be continually maintained and updated on the standby, a challenge that is difficult to handle in a deterministic manner.

Because of the large amounts of overhead and messaging generated by fully state-synchronized systems, such as those described in the previous two scenarios, these designs do not scale easily. A typical service provider edge system, for example, supports 20,000 Point-to-Point Protocol sessions and 200,000 BGP routes with 600 peers. With this load, continually synchronizing state information generates massive volumes of internal messaging, which is nonlinear in its growth-the overhead generated increases exponentially with the volume of messages. As a result, such systems might require specialized hardware and a very large bus bandwidth.

Partial synchronization. Using this option, the active and standby RPs synchronize selective information-enough information to maintain all Layer 1 and Layer 2 sessions, continue forwarding packets and recover the routing database from adjacent nodes. With this design option, there is less data to synchronize, so system consistency is easier to achieve. This is also a less processor-intensive approach, with no specialized hardware required, and is simple to implement. As such, this option scales very well as the number of routers, interfaces and sessions increases.

Note, though, that while switchover is occurring and the routing database is being rebuilt, the system continues to forward packets using the last forwarding database available. So this option carries the potential for black holes.

Protocol Extensions
While the standby RP is becoming active following an outage, the routing processes might not be fully functional, or there might be a period of time during which packet forwarding is not operational. To prevent the adjacent routers from declaring the failed router out of service and removing it from their routing tables and forwarding databases, the IETF has developed a set of routing protocol extensions. These extensions, when running in both the failed router and its peers, prevent routing flaps when a router is temporarily unavailable to share routing information but continues to forward packets while it recovers. The first protocol extension to become an IETF-Draft is "A Graceful Restart Mechanism for BGP," better known as "BGP restart."

BGP at the edge. One reason that BGP was targeted as one of the first routing protocols to receive high-availability extensions is that it has been designed to carry a very large number of routes, compared with other routing protocols. Convergence following a BGP software failure usually takes longer than with other routing protocols, resulting in an outage of longer duration. In addition, BGP is typically deployed at the WAN edge, between the domains of different network operators. Because BGP advertises IP routes across multiple domains, the impact of a failed BGP process can propagate across two or more networks rather than being confined to a single domain. This results in additional network ramifications.

With BGP graceful restart enabled on an edge device and its peers, the data plane can continue to process and forward packets even if the control plane-which is responsible for determining best paths-fails. By also reducing routing flaps, graceful restart stabilizes the network and reduces the consumption of control plane resources.

High-availability extensions like BGP graceful restart are also in development for other routing protocols, such as intermediate system-intermediate system and open shortest path first.

How Graceful Restart Works
The software extensions must be deployed on the router that has experienced a failure, as well as that router's BGP peers. The peers help the system regain lost routing information and also help isolate failures from the rest of the network. The peers isolate failures by holding off propagating new network information for a short period of time while the router with a failed RP (hardware or software) recovers.

Graceful restart begins when the initial BGP connection between the edge router and its peers is established (Figure 1). The restarting router and peers signal to one another that they understand BGP graceful restart in their initial exchange of BGP "Open" messages. At that time, the edge router also provides its peers with a list of IP-based protocols for which it can maintain forwarding state across a BGP restart-for example, IPv4, IPv6, IP Multicast and multiprotocol label switching.

When the router recovers its BGP software, the TCP connection to the peer router is often cleared. Usually, this would cause the peer router to clear all routes associated with the restarting router. However, with BGP graceful restart enabled, the peer router marks all routes as "stale," but continues to use them to forward packets based on the expectation that the restarting router will re-establish the BGP session shortly. Likewise, the restarting router continues forwarding packets.

When the failed router opens the new BGP session, it will again send a BGP graceful restart acknowledgment to its peers. However, this time, it sets flags to let the peer router know that its BGP software has restarted.

While continuing to forward packets, the peer router will refresh the restarting router with any relevant BGP routing information base (RIB) updates. The peer signals that it has finished sending the updates with an "End-of-RIB" (EOR) marker. This is actually just an empty BGP "Update" message.

EOR markers help speed network convergence, because once the restarting router has received the markers from all peers, it knows it can begin best-path selection again using the new routing information. Similarly, the restarting router then sends any updates to its peer routers and uses the EOR marker to indicate completion of the process. Without the EOR marker, the restarting router would not know when the update was complete and, as a result, might wait longer than necessary to return to normal operations.

Planned Outages
In addition to designing routers so they recover gracefully and with minimal service impact during unplanned outages, designers must consider system availability during planned downtime. Planned outages usually occur when customers upgrade or downgrade their router software on RPs and line cards. The primary goal in preparing a router for high availability during a planned outage is to avoid any perception by end users that performance or availability has degraded. To achieve this, Layer 2 sessions must stay up and packets must continue to flow during the time when the router is not functioning fully. Several methods can achieve this; however, some general software architectural rules for implementing such a system apply:

  • Multiple versions of the software must be able to run on the system concurrently.
  • Multiple concurrent software versions must be able to exchange messages and data.
  • Because the messages and data formats might change from one software version to the next, there must be message and data versioning, as well as a method for the new and old software versions to execute on the new and old messages. These are critical elements for enabling a "hot" software upgrade, achieved by reloading a standby RP with new software and then, at some point, switching over to that RP from the active one.
  • Line cards and RPs must be able to reload with the new version of software while maintaining Layer 1 and 2 connections and continuing to forward packets. This requires some separation of the control plane from the data (or forwarding) plane so that while the control software is being reloaded, the forwarding software/hardware can continue to function. Layer 3 can recover using the graceful restart mechanism described earlier.

In systems that separate the control plane from the data plane, one set of functions can be swapped out while the other continues as usual. Separating these functions must be done in a way that enables customers to load new line cards with images and configurations into the router while the system continues to forward packets based on the best-path information it currently has. By combining these software upgrade/downgrade techniques with the switchover capabilities for unplanned outages, a seamless planned software change can be achieved with minimal impact to network service levels.

Summary and Checklist
To succeed with their high-availability designs, router architects must combine redundant hardware components, including the system RP, with seamless switchover and synchronization capabilities. The method and frequency of synchronization selected should take into account MTTR goals as well as the amount of processing and bandwidth consumption required for the level of synchronization desired between active and standby RPs. To ensure system availability during planned outages, as new software is loaded onto RPs and line cards, designers must ensure that new and old versions can run concurrently, that they can exchange messages and data, and that packets can be forwarded while changes take place. Whether a planned or unplanned outage is at hand, a true high-availability design will enable Layer 1 and 2 connections to remain live and packets to be forwarded while the system gracefully recovers-ideally, with no perceptible impact on network users. A quick checklist of high-availability components and functions follows.

High-Availability Design Checklist

  • Dual route processors (active and standby)
  • Redundant, hot-swappable power supplies, chassis, line cards and fans
  • Centralized, distributed or midplane architecture
  • Must cover all software and hardware failures and recover seamlessly
  • Graceful routing protocol restart mechanisms
  • Partial or full synchronization between route processors to preserve state (Layers 1 and 2 must stay live upon switchover)
  • Separate control and data planes
  • Support for planned upgrades and downgrades with minimal impact
  • Balance of features, cost and expected system and network service levels

Related Articles

  1. "Get high availability using effective fault management"; www.commsdesign.com/story/OEG20020701S0018
  2. "Optimizing RTOSes for HA architectures"; www.commsdesign.com/story/OEG20021009S0010
  3. "High-availability systems made easy"; www.commsdesign.com/story/OEG20010103S0061
  4. About the Author
    Purnam A. Sheth (pasheth@cisco.com), the director of IOS software engineering at Cisco Systems Inc., holds a bachelor's degree in math from the University of Waterloo, Ontario. He has amassed more than 15 years' experience developing highly available data-networking products.




EE Times TechCareers
Search Jobs

Enter Keyword(s):


Function:


State:
  

Post Your Resume
-----------------
Employers Area
Most Recent Posts
Accenture seeking Project Management Team Lead in Charlotte, NC

Accenture seeking Software Engineer in Salt Lake City, UT

Boeing Company seeking Software Engineer in Herndon, VA

Switch and Data seeking Customer Solutions Engineer in Dallas, TX

Chart Industries seeking Sr. Developer in Cleveland, OH

More career-related news, resources and job postings for technology professionals



Home  |  Register  |  About  |  Feedback  |  Contact   |  Site Map
All materials on this site Copyright © 2010 EE Times Group, a Division of United Business Media LLC All rights reserved.
Privacy Statement ¦ Terms of Service