As service providers grapple with a critical need to reduce capital and operations costs associated with upgrading and managing separate networks, they must also support and extend revenue-producing ATM/frame relay services today while evolving to future IP-based networks and services. All of these market factors have led a number of vendors to develop a new class of service edge routers aimed at transitioning carriers to a single, multi-service network. These service edge routers sit in the service provider point of presence at the edge of the service provider core, aggregating diverse metro and customer-facing ports for transport across the long-haul IP/MPLS backbone.
Newer service-edge routers present a unique development challenge for design engineers. Overall, these will not be point solutions optimized for one or two tasks. Rather, they need to support ATM, frame relay, OC-n, Gigabit Ethernet, and more.
Part 1 of this series discusses the key requirements driving architecture decisions for service-edge routing platforms, including Internet-class routing, layer 2 traffic support, flexibility vs. performance, and the corresponding use of ASICs and network processors, accounting and classification. In Part 2, which will appear online next week, we'll take a deeper look at the network management components. Let's start the discussion with a brief history,
Brief History
For a better understanding of service-edge router requirements, it is useful to revisit the history of router development within enterprise and core networks. Initial LAN enterprise routers were nothing more than embedded computers with special software and I/O ports. All routing computations and packet forwarding occurred in software with little hardware support.
This approach worked well initially, as bandwidth requirements were relatively low. These platforms were inherently flexible, as new or revised protocols could be quickly supported with a software update. To scale the enterprise LAN, these software-based routers were often combined with hardware-accelerated layer 2 switches. These layer 2 switches typically used ASICs to provide the needed throughput, but lacked the intelligence required for layer 3 routing.
In the mid-90s, the switch/router combo described above began to breakdown as service-edge systems started supporting ever more users and protocols. Designers met some of these demands with further additions to the software.
In addition to ever-growing code base, several other problems plagued the switch/router router combo. First, systems became more fragile. With one large code "loop" performing all tasks, any change to one section of code would often break the functionality or performance of a dozen or more seemingly unrelated features. Additionally, these systems proved practically unmanageable while requests for additional features, scalability and performance continued unabated. The complexity of these systems exceeded the ability of engineering organizations to maintain the code base. The lack of partitioning in the code and the lack of modern operating system principles (protected memory and threads, to name a few) meant that no one person or group could change one section without understanding the impact of the change on the remainder of the system.
One obvious approach to solving this dilemma was to design a purpose-built router with specialized hardware support to guarantee the system could meet both feature and performance requirements. These routers were designed with custom ASICs to achieve line-rate throughput within a very large capacity router. Some level of flexibility is achieved via microcoded machines inside of these ASICs. However, unless all foreseeable requirements are known in advance, there is a serious concern that these devices will not be able to cope with the rapidly changing edge market.
A service-edge router requires flexibility beyond any of these solutions. IP is everywhere, yet the current state of IP routing technology is immature at best. A service-edge router must support all current requirements, yet be flexible enough to support unforeseen future requirements. With this groundwork set, let's now explore some of the requirements that a service-edge router must support.
Edge vs. Core: Determining the differences
Edge router design begins with requirements similar to those of core routers, while adding a range of complex requirements. The big difference is that while core routers are designed with one purpose in mind--to move traffic as quickly as possible--the requirements at the edge are diverse.
The core router market is relatively mature with well-defined, narrow requirements, while the service-edge router is a new device that must support continually evolving requirements. Service-edge routers must aggregate a range of network and traffic types and support a diverse range of requirements. Service-edge routers must support both layer 2 traffic (which requires interoperability with frame relay and ATM devices) with the same level of service guarantees as layer 2 devices, as well as new, evolving IP-based services (which means the edge router architecture itself must be highly flexible to support future requirements). This need for both performance and flexibility is the key architectural challenge for any vendor designing service-edge routers.
First Things First-Internet-Class Routing
Any carrier-class service-edge router must include routing as robust as that found in widely deployed Internet core routers. Routing intelligence is required to know where packets are coming from and where they are going, to provide multiple service levels and service level guarantees to customers, and to shape and groom bandwidth and combine multiple services on a single infrastructure.
Though the use of the term "Internet-class routing" is prevalent in the industry, what does it really mean to be an Internet-class router? It means complete support for routing protocols, including BGP, OSPF, IS-IS and RIPv2, as well as MPLS signaling (RSVP-TE, LDP) and multicast (PIM-SM, MBGP, IGMPv2). Support for a large number of routing peers and adjacencies is also imperative. Yet developing Internet-class routing code is a non-trivial task. In fact, only a handful of core router vendors have done so successfully.
A number of third-party vendors now offer routing code, so designers can assume that simply taking this software and using it for routing functionality would suffice. That is not the case. Although third-party routing code is often robust enough for enterprise routing gear in its native state, the most important routing considerations at the edge are scalability and the ability to interoperate with existing Internet core routers (primarily from Cisco and Juniper), which requires significant in-house development.
Standards and Interoperability
It is not enough for designers to architect their product according to published standards specifications. More than one vendor has diligently "designed to spec" only to find that their product would not interoperate with other products.
Testing with other devices must begin at the earliest stages to ensure that an edge router can communicate with core routers so that traffic can be intelligently transported across the IP/MPLS core. Service-edge router vendors (and third-party routing code developers) should begin interoperability testing with routing code from other vendors and within service provider labs as early as possible, and this process should be continued throughout the development process.
Just as interoperability testing is critical for building a working, carrier-class device, so is the presence of rigorous quality assurance (QA) processes throughout the development process. Given the complexity of design and the interworking of software components, it is not uncommon for vendors to devote a significant portion of development to QA and to integrate significant intellectual property in the development of test scripts and test automation. Third-party scripts are useful in testing, but vendors should also be prepared to devote a significant amount of resources to custom testing as well.
Flexibility vs. Performance
The edge market is in its infancy. Requirements are rapidly changing and it seems that every service provider has their own ideas regarding how a particular service should be supported.
Service and interface flexibility is a key attribute for any edge device. Features implemented in software are flexible by design; however, overall system performance is often limited by the lack of appropriate hardware support. To meet the flexibility and the performance requirements, today's edge routers must partition the problem and rely on ASICs, reprogrammable logic, and network processors. What percentage of each vendors put into the mix depends upon their development philosophy.
One approach assumes vendors know what the packet will contain and requires them to design for that consideration today and in the future (the crystal ball method). The second assumes that packets are treated as general entities, so that no matter what the packet contains it will be treated appropriately based on the level of traffic management, the QoS applied to it, etc. This gives vendors a bit more flexibility so that if a new service gains market acceptance, they can support this. Yet designing with this level of flexibility and maintaining performance levels requires a carefully thought out, distributed approach to product design.
Reprogrammable hardware in the form of network processors and FPGAs can significantly extend the capability and life of an edge product, but often the designer may overlook other important aspects of a flexible design such as memory table size and support for in-field upgrades of reprogrammable hardware. The requirements should be considered up front and, if necessary, larger memories utilized, or at least the circuit board designed such that it will allow larger memories to be populated in the future.
Let's explore the issues associated with ASICs, network processors, and FPGAs. We'll start with ASICs.
ASICs are best integrated into the design where specialized, well-defined tasks are required and vendors have a high degree of certainty that requirements will not change. ASICs are highly utilized in core routers given their need for speed and their ability to perform a small number of specialized tasks.
Given the long development cycle, designers should carefully consider where and how to employ an ASIC in a system architecture. An error in ASIC design could delay an entire product for months.
An enormous amount of care has to be taken when architecting an ASIC, taking current and future requirements into consideration when deciding which particular features will be supported. It is these future requirements that cause the most problems. If one or more is left out, the design can become obsolete almost before the system is released to market as the requirements at the edge are changing that frequently. A vendor may be able to work around this problem by adding support for the new feature in software; however, this technique can have a serious effect on overall performance.
Network Processors
Although network processors have a legacy of association with higher cost and lower performance than ASICs, today's network processors have come a long way. The latest generation offers high performance and the flexibility to change the key architectural components required to support evolving requirements without significant cost. Almost all service-edge routers include network processors in their design due to their ability to speed time to market and to extend the product's useful lifetime.
Though today's network processors have evolved significantly, there are still limitations that product vendors must be aware of and that manufacturers looking to increase vendor adoption should address. Network processors are designed for general-purpose use. This jack-of-all-trades approach forces vendors to engage in significant customization.
A lack of standards also plagues network processors. This lack of standards means that vendor time investment is required to effectively communicate with other device-level components.
FPGAs have previously been used within routers to various levels of success. Initial enterprise routers utilized low-speed FPGAs and general-purpose processors, while core routers use them primarily for "glue" logic between custom ASICs and other third-party silicon.
Service-edge routers may rely on FPGAs and network processors (which are technically programmable ASICs), rather than a predominantly ASIC-based solution to support high levels of flexibility. As more of the functionality may be contained within the FPGAs, it is important that they are field upgradeable by the customer with the same ease with which the overall system software can be upgraded.
Accounting and Classification
The ability to count packets at wire speed is an absolute requirement as service providers look to offer new billing models or more accurately track their internal cost of service delivery. This capability also provides the flexibility for service providers to bill based on usage, offer destination-based billing models, offer different service models for retail and wholesale customers or bundle transit and peering services (in the case of an ISP who today typically offers those services on separate links).
Wire speed accounting and classification is best done in distributed hardware as carriers are increasingly demanding that vendors count each packet and not rely on sampling, which can result in inaccuracies.
A distributed architecture is required to offer scalable routing and granular traffic management support. These two benefits are achieved through the application of distributed design on both the control plane and the data plane. A load on one particular part of the system cannot impact the performance of the other part.
A distributed control plane allows slave processors to perform low-level control functions, relieving the central control processor of this significant burden. The demands on a service-edge router's control plane are dramatic. It must scale to handle tens of thousands of logical interfaces and layer 2 circuits while simultaneously supporting the massive forwarding tables required to implement layer 3 VPNs. This can only be achieved by distributing the control plane to processors dedicated to tasks such as slow-path forwarding, lower-layer protocol support (e.g., PPP LCP and ARP), interface management, counter collection and data path setup. Ideally, the slave processors are integrated on distributed forwarding and switching modules so that control plane capacity scales as data plane capacity is added.
The implementation complexity implied by a distributed control plane can be significant. It requires the use of sophisticated programming practices such as the use of threads, objects, and object-aware remote procedure calls. A highly skilled engineering team is required to take advantage of these advanced tools to produce a system with carrier-class reliability. In addition to the engineering team, vendors also need to implement methodical design practices and rigorous application of development processes.
On the data plane front, designers should also consider the use of a distributed architecture. A distributed approach allows capacity to be added later without large upfront cost.
Overall, the expensive portions of the data plane (memory, network processors, classifiers, etc.) should be fully distributed to ensure scalability and cost efficiency. A centralized switch fabric may be present. However, the cost of the fabric can be kept to a minimum if the intelligence is left on each of the processing cards.
Network Management
An often overlooked, but critical component to designing service-edge platforms is tight integration with an element management system from design inception. While the lack of network management in core routers is common, at the edge, service provisioning is too complex to relegate solely to complex scripts.
In addition, tight coupling of element management with the architecture of the device is critical as service providers move to next-generation management structures where hundreds of thousands of customers may be accessing device data. This must be thought out from the beginning of product development, as integrating scalable network management into the device is extremely difficult. We'll address these issues and more in Part 2 of this series, which will run online next week.
About the Author
Robert Warden is the co-founder and vice president of engineering at Laurel Networks. Before emerging as one of Laurel Networks' co-founders, Robert managed the ATM switch hardware engineering team at FORE Systems. He received an MS in Computer Science Engineering from the University of Pennsylvania and a BS in Computer Science Engineering from Bucknell University. Robert can be reached at warden@laurelnetworks.com.