Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec




















eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Green SupplyLine
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • RFID-World
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


09 February 2010



Reinventing the Switch Fabric Architecture

Traffic bottlenecks and QoS concerns plague today's edge access switching system designs. No need to fear, new switch fabrics are here.

By Marek Piekarski
CommsDesign
Jun 01, 2001
Print This Story Send As Email Reprints
 
Caught between the promise of near-infinite bandwidth in the optical core and the ever-increasing speeds and port densities of the access infrastructure, today's edgeswitching systems are running out of steam. The traffic they have to aggregate and route onto the Internet backbone is doubling every 3 to 6 months, is transported across multiple protocols, and is growing in complexity as new services proliferate. With this level of functionality required, more bottlenecks are occurring in access networks and more headaches are arising in today's switching system architectures.

In addition to bottlenecks, today's edge access switch designers must also tackle quality of service (QoS) issues. With more data flowing through a system at faster rates, designers must develop switching architectures that ensure the proper distribution of data through a system architecture in order to ensure that service is not lost.

To solve bottlenecks and minimize QoS concerns, designers of edge access switching systems must rethink how information travels through a system design. Reevaluating switch fabric architectures is a good place to start.

A traditional view

A typical edge switch switching system design consists of a number of network processors interconnected by a switch fabric (See Figure 1). Traffic typically enters the switch via an ingress processor, traverses the fabric, and exits from an egress processor.

Packets or cells arriving at the ingress ports are inspected by the processors to determine: (1) their intended destination; (2) the QoS they are to receive based on the traffic type or a service-level agreement (SLA); and (3) any local modification they may need, such as encapsulation, time to live (TTL) modification, or encryption/decryption.

The QoS required for a particular flow of packets determines when each packet needs to be transmitted from the egress network processor. Ideally, the packets would be transported from the ingress to egress processor without any incremental delay, and get queued at the egress processor until the traffic shaping algorithms determine the appropriate time to forward them into the next segment of the network. This is termed the output-queuing model since queuing occurs only at the outputs of the switch.

An implication of this model is that all incoming packets must be delivered to their intended egress network processor without any delay at the ingress network processor, even if all the incoming packets are intended for the same egress network processor. Each egress processor must accept packets from the fabric not just at the port line rate, but also at the full aggregate bandwidth of the switch.

Sharing the load

The current generation of edge access switch fabrics employ a switching technique based on a shared-memory architecture. Developed when the capabilities of bus- and ring-based switching architectures were exceeded, shared-memory switches contain a global memory into or out of which each line card can write or read. This implementation fits the output-queuing model because all packets queued in the switch are accessible to any egress network processors as if they were in the processor's own local memory.

The performance of such shared-memory devices is scaled by increasing the bandwidth of global memory housed in the system. Consequently, the scaleability depends largely on the ability of the semiconductor industry to continue to create faster memory while still maintaining bus widths with reasonable IC pin counts.

However, memory improvements, which double roughly every 18 months, are not keeping pace with the growing bandwidth demands on the edge network. Each generation of edge switch - there is a new one approximately every 18 months - must boost its capacity by a factor of four, while memory solutions typically only performance by a factor of four, not two. Some of the difference can be made up by using memory devices with a wider array, but once the memory width reaches the size of the cells or packets being transmitted, increased width is no longer helpful. Also, pin counts go up as buses get wider, to the point where packaging and layout become impractical. As a result, shared-memory switch fabrics currently won't scale beyond 20 Gbps of total line-end bandwidth.

Alternative lifestyles

To compensate for the memory headaches encountered in today's edge switching system architectures, designers are turning to new models for their switch fabric architectures. One of the more popular is the input-queuing model.

The input-queuing model eliminates the need for each egress network processor to gain access to all packets the moment they arrive. Instead, the fabric only provides a transport between the ingress and egress network processors, allowing the processors to deliver packets a little faster than the port line rate.

As with the output-queuing model, there are a number of different implementations of input queuing. The two most often encountered, however, are the multistage interconnect network (MIN) and the crossbar.

The MIN is essentially a structured network of smaller switches with a well-defined routing algorithm collapsed into a single fabric. However, these MINs can create almost as many problems as they solve. Because there are multiple paths, multiple arbitration decisions, and multiple queuing stages to deal with (significantly more than the two implied by the input-queuing model), delivering guaranteed QoS levels cost effectively becomes extremely difficult. Since data goes through multiple routes to get from an input port to an output port, there can be serious latency problems that will make it difficult or impossible to handle time-sensitive traffic such as voice and streaming video efficiently.

The MIN architecture is also expensive to implement because so many of the switch interconnects get used up internally. In a MIN architecture, about 20% of the interconnect is available for connecting line ends, while 80% is used for moving data around internally. Since the cost of the silicon is closely related to the amount of I/O it provides, this results in a much higher cost, for a given bandwidth, over a single-stage switch.

MINs do have a place in the switch fabric hierarchy and will continue to play a role as long as global bandwidth demand outstrips technology improvements. MINs can scale up into tens or hundreds of terabits of bandwidth, so they are being used today to build multi-terabit switches for the carrier core.

It only makes sense to use MINs when the switch manufacturer is trying to support aggregate bandwidth that is higher than what can be delivered through single-stage switch fabrics - without resorting to exotic and costly non-CMOS technologies.

The crossbar approach

Crossbar fabrics have long been recognized as potentially providing the best architecture for single-stage, high-bandwidth switches. These fabrics use space-division multiplexing (SDM) to create a switching medium with a high degree of parallelism. Any data path need only sustain the bandwidth of a single switch port, so the aggregate bandwidth of the crossbar fabric can be orders of magnitude higher than shared-memory or other single-source switching fabrics that use time-division multiplexing (TDM).

Latency can be low for crossbar switching, and it actually goes down as bandwidth goes up. Crossbar switching fabrics can scale from a few gigabits per second into the terabit range.

To be successful, crossbar switch fabrics must transport data derived from a range of network types, including variable-length IP packets, ATM cells, and TDM byte streams. In order to best manage the QoS, all these data types should be transported in optimally sized fixed-length fabric cells. Although this implies a need for segmentation and reassembly (SAR), in practice it is a small cost to pay for the degree of QoS management it enables.

In a typical crossbar fabric, cells are queued on the input side of the switch fabric. The state of all the input queues is visible to the crossbar arbiter. On the basis of these states, knowledge of the QoS required for each flow, and feedback from the egress network processors about the states of the output queues, the arbiter decides which connection to make in the memory-less crossbar and thus determines the order in which cells get forwarded to their respective egress network processors.

In order to give the arbitration algorithms the greatest freedom and flexibility to manage the QoS and to maximize the efficiency of the fabric, the cells in the input queues are presorted on the basis of destination address and class; cells requiring broadly similar QoS are placed in virtual output queues (VOQs). The QoS-aware arbitration algorithm can then ensure that the output queues in the egress network processor are never starved of cells which may already be waiting in the input queues.

The intelligence test

Despite some of its advantages, crossbar switch fabrics still fall short of answering all the demands on today's system architects. Intelligence is one area that is problematic. To deliver stronger edge switching solutions, designers not only need fabrics that effectively distribute data around a system, they need fabrics that can intelligently move data throughout a system architecture. Traditional crossbar products fall short in delivering higher levels of intelligence in the edge switching architecture to improve QoS.

Intelligence is key in the edge network, because it represents the last opportunity to shape and optimize the traffic before it disappears into the "dumb" core. Edge switches need to handle multiple protocols - including IP, ATM, frame relay, and TDM - and support cost-differentiated services.

Edge switches are protocol-agnostic, with individual line cards dedicated to specific types of traffic such as 10 Gigabit Ethernet or OC-48 packet over SONET (POS). The difficulty of the arbitration and scheduling task increases exponentially as more line cards are added.

One solution would be to use distributed arbitration on each line card, but the arbiters must have some way of communicating with one another and coordinating their switching decisions. This process will inevitably take more time than the required arbitration rate while introducing inefficiencies throughout the switch fabric. Consequently, QoS efforts will suffer from less-than-optimal switching decisions.

Theoretically, this problem can be mitigated by providing more overspeed in the switch fabric - bandwidth in excess of what the line cards require. In practice, however, the industry is already pushing the bandwidth envelope to the limit, so using a significant part of the fabric core bandwidth to compensate for fundamental inefficiencies in the architecture is not a good solution.

Enter the global arbiter

A global arbiter can eliminate a lot of communication overhead and thus reduce latency by maximizing the width of the pipes in the switch fabric. By doing this, the arbiter allows the crossbar switch fabric to turn into an intelligent switching device (See Figure 2).

In a crossbar switch fabric architecture, the global arbiter balances the QoS requirements of every individual cell in the fabric at wire speed. Because the arbiter has a global view of the traffic, there is no need to waste core bandwidth on arbitration guesswork. Such a global arbiter can use the crossbar resources at better than 97% efficiency.

QoS, which is traditionally based on output queuing, can be delivered through an input-queuing model with a global arbiter. The best arbitration chips can look at all the potential simultaneous flows - 1024 in a 32-port switch, with multiple traffic classifications to be dealt with on each of these I/O port combinations - and make switching decisions once every 20 to 30 ns. This guarantees that QoS can be delivered across the entire switch, with time-sensitive traffic such as voice and video receiving guaranteed bandwidth and bounded latency.

Such QoS capabilities also enable service providers to deliver metered bandwidth. Finally, global arbiters can segment traffic and charge for it at a granular level. For example, a building local exchange carrier (BLEC) can move a switch into a large office building and use it to provision various types and levels of service to the different tenants. The intelligent crossbar switch fabric can track who is using the bandwidth and for what purpose, and make switching decisions that fulfill the terms of sophisticated service-level agreements.

But to achieve this level of functionality, the global arbiter must make fast decisions. Thus, it has to make one complete solution of the arbitration problem during every cell period, which is only 20 to 30 ns in a typical OC-192-port switch with a reasonable degree of overspeed. This presents a daunting technical challenge. Fortunately, switching IC manufacturers are stepping to the plate and can now deliver this level of functionality.

A tall order

In addition to bringing intelligence to switching fabrics, designers are also faced with achieving higher levels of integration in their switching architectures. Until recently, switch manufacturers have built boxes containing two separate switch fabrics - one for TDM/SONET cross connection and one for IP/ATM. This approach, however, adds both size and complexity to the switch. The switch fabrics have to be managed differently, and the manufacturer must often deal with multiple suppliers.

Given the space, size, and cooling constraints in edge facilities, switch manufacturers need a single, more versatile switch fabric that can handle both TDM/SONET and IP/ATM traffic. It's a tall order, but a new generation of crossbar switch fabric technology is rising to the challenge. These switch fabrics can aggregate and route TDM, IP, and ATM traffic simultaneously at hundreds of gigabits per second.

In addition to integrating IP, ATM, and TDM traffic, designers are also looking to integrate other functions on chip. For example, the serializers/deserializers (serdes) components in today's edge access equipment designs today can be replaced with on-chip integrated transceivers. This gives the switch fabric a smaller footprint and dramatically reduces its power and cooling requirements.

Integrating the serdes, however, creates new challenges for designers building edge access switching systems. One challenge is coming up with greater drive capability for long backplane PCB traces, allowing for variable trace lengths, and eliminating the unwanted effects of placing multiple high-speed transceivers close together on the same silicon die. Attempts to use traditional methods to integrate existing symmetric and asynchronous link architectures are highly suspect, and so far have failed to materialize on the market.

An asymmetric approach

A new approach has been developed to solve the PCB trace problems being encountered by today's designers. This new approach employs an asymmetric and synchronous architecture to improve performance in de-signs employing switch fabric ICs that combine serdes functionality on board.

Traditional serdes have trouble accommodating for the basic asymmetry of the switch environment. In a traditional switching system, at one end of the system there aren't many serial links on the individual line cards. At the other end there are hundreds of serial links coming into a few chips in the fabric. It is at this end where considerations such as power consumption and die area become critical.

To handle signal flow through the system, designers have traditionally employed a single transceiver that averages the needs of the two sides of the link and provides both ends with the same capabilities. In a Gigabit Ethernet environment, this symmetrical design limits the number of transceivers that can be employed in the system architecture to between four and eight.

Unfortunately, today's edge switching architectures require many dozens of transceivers at the fabric side of the link to properly operate in a high port environment. Therefore, the symmetrical approach, which is typically employed in modern switch fabric architectures, falls far short in meeting the demands of today's switching system architectures.

A better solution is to use an asymmetrical transceiver design that puts most of the intelligence at one end of the link. This makes the other end far more compact and power efficient, resulting in dozens of transceivers that can fit on a single chip.

The basis of the asymmetric serial link is the master/slave nature of the phase-locked loops (PLLs) at each end of the link. Unlike traditional serial links that use power hungry high-speed PLLs at each end of the link, an asymmetric link requires a PLL on only one side of the link.

By employing a special phase measurement and alignment technique, the slave end is synchronized to the master end each time the link is established. Many slave-end serial links can now be put on the same die using only one PLL for all links at the slave side. This approach greatly reduces power consumption and eliminates the issues associated with traditional serial links, like injection locking.

Overall, the asymmetric approach looks like a traditional asynchronous link on the master end, since it uses a PLL, but acts like synchronous link at the slave end. Thus, since each end is different, it is now called an asymmetric link (not identical at each end).

The asymmetric architecture solves many headaches for today's designers. The greatest advancement is the efficient integration of the serdes on chip. By operating in both a synchronous and asynchronous manner, the serdes can be married on the same die as the switch fabric IC, reducing component count and increasing performance in today's switch fabric architectures.

Marek Piekarski is the manager of systems architecture at Power X Ltd. He received his BSc in electrical and electronic engineering from the University of Manchester and can be reached at marek.piekarski@powerxnetworks.com.




EE Times TechCareers
Search Jobs

Enter Keyword(s):


Function:


State:
  

Post Your Resume
-----------------
Employers Area
Most Recent Posts
Ascension Health seeking Solutions Development Analyst in St. Louis, MO

National Semiconductor seeking Principal IC Design Engineer in Santa Clara, CA

Taylor Guitars seeking Sr. Web Designer in El Cajon, CA

Covidien seeking Hardware Manager in Boulder, CO

Sierra Nevada seeking Software Engineer in Hagerstown, MD

More career-related news, resources and job postings for technology professionals



Home  |  Register  |  About  |  Feedback  |  Contact   |  Site Map
All materials on this site Copyright © 2010 TechInsights, a Division of United Business Media LLC All rights reserved.
Privacy Statement ¦ Terms of Service