Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec


















Audio Designline



eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine
 • RF DesignLine

ELECTRONICS GROUP SITES
 • NEW! SpecSearch
 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


17 May 2008



Reconfigurability Poses Challenges to W-CDMA Base Stations

Splitting tasks between DSPs and FPGAs worked fine in past base station architectures, but causes problems in today's W-CDMA system. Thus, to meet performance demands, engineers should turn to massively parallel architectures that combine datapath and control path processing on the same silicon.

By Mario Bedoya, picoChip
CommsDesign
May 01, 2003
Print This Story Send As Email Reprints
 
After a long delay, W-CDMA is starting to deploy in growing volumes around the world as services are defined and attractive handsets are becoming available. Both operators in Japan are reporting rapidly growing subscribers, three operations have launched with great fanfare in Italy, Australia, and the UK, while most other operators are now doing "soft launches" for commercial service later this year. Even in the US, which had been viewed as the last place to expect service, AT&T have committed to launch next year. As such, the challenge has changed from "getting something out there" to optimization and cost-reduction and looking to the future of the technology.

With revisions to the core specifications still ongoing, the development of W-CDMA Node-B base stations remains an expensive challenge for design teams. The demands of the complex baseband protocol stack (Figure 1 below) push the limits of available silicon, which has a knock-on effect on development time. So, design teams are keen to find out about the capabilities of any new device that could ease this pain.

To demonstrate they have silicon available to ease the job of design, a number of vendors have prepared benchmark studies. Many of these Node-B evaluation platforms are focused on a narrow set of "ideal" conditions.

There is a benefit to this approach. It provides a reasonable target for evaluation and provides an indication of how much processing power will be needed for certain situations, such as 64 channels of voice traffic within a medium-sized cell. However, the indications from the operators who will buy the base stations is that they can't pick base stations for specific functions, and that the benchmark cases chosen are all too often unrealistic.


Figure 1: Structure of the baseband protocol stack.

Recently, the requests for information (RFI) that have come from operators are increasingly focused on flexibility as a primary concern. These RFIs come from several drivers. First, there is a clear awareness that this is new technology, with comparatively little "real world" experience. As such, there is a lot of scope for optimization of the algorithms.

Second, there is a concern about the evolution of standards. While the pace of change has slowed as equipment reaches deployment status, there are still new features being defined and standardized (for example, high-speed data enabling 14 Mbit/s downlink capacity). Operators are naturally concerned about their ability to use these protocols. They are also concerned that they will implement obsolete equipment in their network that cannot support these new protocols as well as protocols still under development.

Finally, there is the more subtle but powerful desire to optimize resources to match to traffic needs. Keen to be able to derive revenue as quickly as possible from 3G services, the operators do not want to be in the position of being forced to miss out on a lucrative revenue stream because their chosen base station does not support the mix of traffic that they need.

Embedding Flexibility
Faced with these business imperatives, and feedback from their customers, many equipment designers are building flexibility into the design goals for future products.

In principle, the conventional approach to base station design should yield flexible systems. Many designers have embraced the combination of field-programmable gate array (FPGA), digital signal processor (DSP), and general-purpose processors to implement the baseband functions for W-CDMA. The design offers apparent flexibility as individual FPGAs and DSPs are re-programmable. However, each uses a different set of tools and development techniques. That means designs need to be implemented separately on each, generally by different teams of engineers. It is only during the integration phase that the development team can see the real interaction between each of these components. Additionally, modern DSPs are very complex and non-deterministic, so performance cannot be simulated but only determined by statistical test of a design.

The situation is further complicated by the fact that many processors, DSPs and FPGAs need to be used in a base station linecard to provide sufficient horsepower to handle the complex W-CDMA baseband protocols, which require complex interaction, schedulers, or load balancing.

Consequently, although individual elements are programmable, the ensemble when it is placed on a board, with strict performance requirements and intricate interdependencies, has lost much of its flexibility. A good analogy is a house of cards: although a playing card in isolation is flexible, the structure as a whole is so delicately balanced nothing can be changed without it collapsing.

In principle, the layered protocol structure of W-CDMA as defined by 3GPP eases the job of building a baseband card using a variety of components. There is an apparently clear distinction between the different parts of the protocol, which can be broadly separated into chip-rate, symbol-rate, transport-layer and operations, administration and management (OAM) components.

But, while 3GPP lays out the separation, the standard does not provide the clarity the GSM spec provides in distinguishing between the physical, network-access, transport, and management layers. There are many complex interactions between the different layers in the OSI stack that the 3GPP spec does not cover and thus must be dealt with in the design process. Many benchmark designs often ignore these interactions for the sake of simplicity and will pick a particular set of traffic and air-interface classes to demonstrate an operational system.

With the right architecture, however, it is possible to align the operators' need for flexibility with the complex interactions that the 3GPP standards demand of a complete W-CDMA baseband design. Clearly, this is critical for manufacturers who wish to deliver such systems.

Degrees of Freedom
There are two primary degrees of freedom in Node-B base station requirements. The first is the size of the base station itself. Ideally, operators would like to be able to select from a menu of base station options, ranging from picocells that can be concentrated on a building or shopping mall to full macrocells that deliver 50-km range. The system requirements increase dramatically with range, as both delay spread and signal dynamic range increase,while the degree of multipath increases dramatically.

The second concerns the traffic types that the base station needs to be able to support. Traffic on a W-CDMA can vary widely, from voice through a variety of data types to the latest "broadband Internet" addition to the protocol. Voice calls are handled very differently than data transfers, not only in terms of the forward error correction (FEC) used but in the treatment of the frame at the physical layer. Based on cell conditions and operator requirements, the data rate for data packets can vary widely due to changes in the spreading factor.

A high spreading factor, which increases the number of chips per symbol, will reduce the data rate but reduce the error rate if fading conditions are bad. If conditions are bad, however, an intelligent base station design may also deploy more resources to augment the number of paths handled by the rake receiver, or switch to a different algorithm. This has a knock-on effect on the chip- and symbol-rate processing section of the baseband as different levels of resources need to be deployed to process the incoming data.

Under good conditions, the limiter may be the performance of the turbo-code section, which may be deployed on a DSP armed with specialized assistance engines or on an FPGA. Under poor conditions, the chip-rate section is likely to take on more of the burden.

Virtually all current designs partition the system between FPGA and DSP (although the specifics of this partitioning are notoriously complex) [Figure 2]. Conventional wisdom says that the chip-rate sections will go predominantly into the FPGA part of the baseband card while the symbol-rate portion generally fits better on a DSP. It is easy to see why when you compare the estimated processing speeds for given parts of the W-CDMA protocol stack. Taking the receive section, the chip-rate part of that, which includes equalization, digital filtering, de-spreading, and the rake receiver, constitutes most of the MIPS needed, if you assume implementation on a non-augmented DSP. Out of a total of some 6000 MIPS, the receive path 4500 of them are needed for chip-rate processing of a 384 kbit/s stream.


Figure 2:Architecture of atypical baseband featuring FPGAs and DSPs.

It makes sense then to put the chip-rate section into an FPGA, or so it seems, because that many MIPS demands a lot of DSPs (which in turn implies a lot of dollars and a lot of watts).

However, many of the algorithms in W-CDMA are control intensive and rely on large numbers of integer multiplications. Although FPGAs can handle this kind of workload, managing the sessions and structuring algorithms puts a lot of strain on the designer, not to mention the implementation.

Control-intensive designs imply a large state space that may prove difficult to implement cost-effectively in terms of logic density and design time on an FPGA. Similarly, multipliers on FPGAs, unless they have specially designed hard cores embedded for the purpose, need specialized bit-serial architectures that are slow unless highly parallelized. This again slows down the design process because engineers have to spend time working on implementation hardware description language (HDL) code and verifying it, not on the algorithms that form the core intellectual property (IP) of the company.

Closer examination of the specification often reveals optimizations that can drastically reduce the computational load. For example, algorithms have been proposed for the path-selection part of the rake receiver that can cut the workload by almost two orders of magnitude compared with a "brute force" datapath-oriented hardware design. However, they are algorithms that fit best on a processor with a reasonable amount of memory attached to it. While such an algorithm may improve the cost-performance ratio of the base station, it complicates the design process because we now have key parts of the chip-rate subsystem sitting in both the FPGA and processor sections.

In the symbol-rate section, the turbo code portion was initially thought of as being a primarily hardware-implemented algorithm as it could demand as many as 1000 MIPS on a vanilla DSP architecture. However, specialized support for the functions needed by turbo codes have been added to processor architectures aimed at base stations.

There are other portions of the chip-rate section that apparently fit better on a processor. These are typically the real-time measurement functions that look at channel behavior. Both the rake receiver and the equalizer rely on channel impulse response (CIR) estimation.

The least-squares calculations that may be used for CIR estimation tend to be matrix intensive, apparently lending themselves to a DSP implementation. However, the size of the matrix can lead to a slow update rate, incurring long error bursts as channel conditions change. That can force a rethink of the algorithm to make it fit a hardware substrate. So, the design may move back from the software to the hardware partition and into the hands of a new design-implementation team. There are many other examples where the partitioning of a design may change as it is tested, or as new requirements are imposed.

Making the Situation Worse
As modern DSPs get faster, they get more complex, which can make the design of a W-CDMA base station even tougher. The desire to optimize performance out of a single processor leads to a number of optimizations (deep pipelines, dynamic caches, out-of-sequence execution, etc) which improve average performance at the cost of reducing visibility or predictability. This compromise is acceptable for general-purpose applications but can prove problematic for demanding real-time systems which must be able to guarantee worst-case response.

Typically, the only solution is to rely on extensive testing to ensure that all permutations or corner-cases have been checked and that there are no problems. While hardware designs (FPGA, ASIC) are simulated in a deterministic way, with predictable performance, complex DSPs are only statistically predictable and hence require exhaustive (time-consuming) test.

There are further interactions between the various parts of the baseband module. The 3GPP technical standard 25.215 demands that a number of measurements are taken at various points within the transmit and receive chains. These can have a surprisingly large effect on the implementation of the baseband because of the need for units to report on their status at given times or at the request of the OAM code running in a general-purpose processor somewhere in the module. However, while critical they are often forgotten (or perhaps, "forgotten") from the benchmarks but which are critically important for a real system implementation.

Staying in the Same Environment Helps
Because of the complexity of the interactions in the W-CDMA stack, it is best to try to keep as much of the design as possible within the same environment instead of being forced to split the development effort between teams with different implementation expertise. FPGA design is a specialized discipline that calls for knowledge not only of a HDL but experience in what works on an FPGA in terms of hardware implementation.

However, just about all communications algorithms are modeled in C first and simulated on a processor. As we have seen, additional hardware support has gradually been added to DSPs to support certain pieces of the W-CDMA protocol, such as Viterbi and turbo codes. However, even the long-instruction word architectures, which allow multiple instructions to be run in parallel, used by advanced DSPs can only go so far in being able to handle more than a small piece of the total protocol.

Fortunately, the advanced silicon processes available today for integrated circuits (ICs) makes it possible to implement hundreds processors on a single chip together with distributed memory blocks. If these can be structured to deliver the performance, a "best-best" structure can be implemented, able to deliver required combinations of raw performance for chip rate, the complex functionality for symbol rate and the rich structures and intimate access for OAM structures.

The key to leveraging the power of such an array of processors lies in an interconnect that helps system designers minimize the amount of local storage needed by optimizing communication between processors and a development environment that allows multiple algorithms running on different processors to be hooked together easily.

Making Parallelism Work
One of the biggest stumbling blocks to the use of massive parallelism is the difficulty of passing data between processing elements. The programming environments for most processors assume a small number of threads of control in action at any one time. This is in spite of the fact that the W-CDMA is highly amenable to parallelization, especially in the chip-rate portions, just as long as a flexible control structure, allowing for both coarse- and fine-grained control, can be put in place.

The dataflows between processes running on different cores can be predicted at compilation time, allowing the use of time-multiplexed interconnects, which helps reduce the amount of wiring needed on-chip. With a deterministic interprocessor fabric, it then becomes possible to borrow some useful features from the hardware world, which are designed to express parallelism, and bring them to a predominantly software-focused environment (analogous to the way block-based systems environments like Simulink or SPW describe algorithms). This determinism is extremely important in reducing verification & testing time.

It is possible to use signals to describe the data flows that pass between processes, and the structures that define such processes. If these are strictly "typed", then debuggng and development are greatly accelerated (again, this is very conventional in the hardware domain). Within each processor, designers can implement the specific process using standard C or an assembler. Designers can then have each processor communicate in a defined way to the other processes. The "time-slicing" of a conventional RTOS or load-balancer is replaced by dividing tasks up across an array, all communicating and executing in a deterministic way.

By describing the interconnectivity of processes using these mechanisms, it is possible to borrow other useful elements of hardware design, such as place-and-route algorithms. A placer can efficiently place processes that need close communication near to each other, relieving the design team from the burden of this job.

Memory Matters
Distributed memory is another crucial element in massively parallel architecture. Although high-end DSPs have access to megabytes of on- and off-chip memory, many of the algorithms needed to implement core chip- and symbol-rate functions have kernels measured in bytes and do not need to store much in the way of data. It is the high-level OAM functions that typically need much more code and data memory. This can be provided using off-chip memories to a smaller number of on-chip processors.

By combining the deterministic features of hardware and SoC development, with the familiarity and flexibility of conventional coding, this architectural style can be described as a software SoC (SSoC).

It takes more than one IC, even armed with hundreds of on-chip processors, to implement a W-CDMA Node-B baseband module. However, by simply extending the time-multiplexed communication system between ICs, it is possible to scale up the capacity of the system linearly with the number of devices with relative ease. The same C-based design environment can be used.

A massively parallel architecture also provides a good degree of flexibility when it comes to the interface between control and datapath code. It is tempting to think of OAM code as being best implemented on a single, high-speed processor armed with megabytes of local memory. Centralized control and management can be easier to manage since we are only required to develop code for a single processor. If the interface to the signal processing chain is well defined and relatively small in number, developing centralized control can be very effective.

Solving Control Plane Issues
A fundamental problem with legacy architectures, such as high-end DSPs, is that it is not easy to combine coarse-grained and fine-grained control tasks. Although it is possible to handle most coarse-grained control activities, the tasking structure used by most kernels makes it difficult to get timely, fine-grained control. This can cause significant performance problems; for example, latency in the inner power-control loop direct affects performance and capacity (and hence carrier revenue). It is comparatively easy to model how improving latency can increase performance and profitability.

With a parallel-processor array, it is possible to make use of a hierarchical control structure that distributes the tasks to different CPUs and aligns the "level" of processor to the level of task. A high-level OAM processor may take care of global operations and administration but hand off fine-grained functions to semi-autonomous controllers distributed through the array, each one specialized to a small number of functions relevant to the datapath processors around them. The underlying architecture can be the same. The OAM code may simply not make use of specialized instructions used by the datapath processes, such as Viterbi decoding or dispreading. The logic needed to implement these functions is typically small relative to the processor core, meaning that there is little in the way of waste silicon.

Adding Related Air Interfaces
Once such a flexible, software-based architecture has been embraced, it becomes easier and quicker for design teams to work on related radio-communications systems. There are other 3G protocols, such as time division duplexing (TDD) for data-oriented applications with a W-CDMA context, or the Chinese developed TD-SCDMA. Although not based on the W-CDMA protocol, mobile broadband systems such as the recently launched IEEE 802.20 (MBWA) project are embracing similar themes of flexible radio protocols that can respond quickly to changes in fading conditions and bursty traffic such as Internet Protocol (IP) packets.

As wireless protocols become more IP-focused, and seek to optimize performance and efficiency, designers can expect to see even more interaction between the control and datapath elements of the protocol. That will drive the emphasis towards development environments based on a unified view of the system that can provide the design flexibility and responsiveness needed for efficient development, while also delivering the raw processing horsepower required.

About the Author
Mario Bedoya is a senior systems engineer at picoChip. In this role, Mario developed the company's systems library. Mario received a B.Eng Hons. in Electronic Engineering and Integrated Circuits from Birmingham University and can be reached at mariob@picochip.com.




EE Times TechCareers
Search Jobs

Enter Keyword(s):


Function:


State:
  

Post Your Resume
-----------------
Employers Area
Most Recent Posts More career-related news, resources and job postings for technology professionals



Home  |  Register  |  About  |  Feedback  |  Contact