Jennifer Eyre, Senior DSP Analyst, Berkeley Design Technology Inc., Berkeley, Calif.
Implementing the digital signal processing (DSP) tasks in telecommunications applications typically requires chips with very strong number-crunching capabilities. At the same time, telecom applications place stringent constraints on cost and power consumption. DSP tasks in telecom products have historically been carried out with DSP processors or application-specific ICs. On the one hand, ASICs can achieve high levels of performance with hard-to-match cost and energy efficiency, but they require massive design efforts. DSPs, meanwhile, ease the development process, and provide adequate performance and reasonable efficiency for many applications.
Throughout most of their history, field-programmable gate arrays (FPGAs) have rarely been used to implement DSP tasks. Until fairly recently, FPGAs lacked the gate capacity to handle demanding DSP algorithms and didn't have good tools support for tackling DSP jobs. They have also been perceived as being expensive and power hungry. All this may be changing, however, with the introduction of new DSP-oriented products from FPGA heavyweights such as Altera Corp. and Xilinx Inc., both based in San Jose, Calif.
Altera's recently announced Stratix family and Xilinx's Virtex-II family both offer significant DSP-oriented architectural enhancements. For example, both products offer hardwired on-chip multipliers embedded throughout the reconfigurable logic array that are intended to accelerate the multiply-accumulate (MAC) and similar operations common in DSP algorithms. By including some hardwired processing elements, FPGAs can improve their energy efficiency and cost performance while offering outstanding DSP performance. In addition, both companies offer sophisticated DSP-oriented development aids, such as intellectual property library blocks for common DSP functions and interfaces to high-level DSP tools such as Simulink. And, perhaps most important, chip densities have increased to the point where FPGAs can implement even highly challenging DSP tasks.
The computational requirements of today's telecommunications applications often exceed the performance available from even the fastest DSP processors. This makes the new breed of DSP-enhanced FPGAs a potentially attractive solution for certain applications. A key challenge for system designers, though, is understanding where it is appropriate to use these new devices.
Unfortunately, designers have been stymied by the lack of a reliable way to evaluate the DSP performance of FPGAs or to compare their performance to that of DSP processors. Clearly, there is a need for DSP-oriented benchmarks that will enable engineers to make these comparisons.
Benchmarking
Good benchmarking requires careful selection of benchmarks and a well-developed methodology. Berkeley Design Technology Inc.'s benchmarking of processors for DSP applications uses a suite of common DSP algorithms, such as finite impulse response (FIR) filters, optimized in assembly language on each processor. A processor's results on each benchmark can be thought of as "basis vectors" that can be combined to estimate performance in an application. When BDTI began considering how to benchmark FPGAs, it quickly became obvious that this approach wouldn't translate well. One key problem is that the small algorithms used to benchmark processors don't do a good job of evaluating the real-world processing capabilities of FPGAs.
On a processor, the architecture and instruction set are fixed-the main degrees of freedom when implementing a function are in choosing instructions and in how the instructions are ordered. DSP developers working with processors tend to optimize the processing-intensive "kernels" of the application individually, then combine them to form an overall application implementation that is near-optimal. On an FPGA, however, it is unlikely that developing optimized implementations of algorithms one at a time will yield particularly good, or meaningful, overall results. This is because, unlike on processors, the degrees of freedom when implementing a DSP application on an FPGA are myriad.
For example, the developer can dedicate more or less hardware to a given algorithm, making application-level resource utilization trade-offs. This may result in a highly optimized implementation of one algorithm at the expense of another-as may be the case, for example, if one algorithm uses all of the hardwired multipliers, leaving none for the other algorithms. For each algorithm, the developer can use a fully parallel implementation, a fully serial implementation or anything in between. It is unlikely that every constituent algorithm will be optimized for maximum performance; instead, developers work to optimize the application as a whole.
As a first step toward providing meaningful DSP performance data for FPGAs, BDTI recently developed a new telecom-oriented FPGA benchmark for use in a forthcoming industry report, "FPGAs for DSP." Rather than using a single algorithm as a benchmark, this new benchmark specifies a full (though simplified) single-channel communications receiver. It is designed to be representative of the kinds of processing found in communications infrastructure equipment for applications like DSL, cable modems and fixed wireless systems. It includes blocks for demodulation, filtering, time-frequency domain transformation and channel decoding. Input and output data formats and sample rates, along with other implementation details, are specified as part of the benchmark definition. Benchmark results are reported in terms of the number of channels that can be supported on a single chip and the associated cost per channel based on the chip cost. These results can be used to compare an FPGA's performance to that of DSPs.
In cooperation with BDTI, Altera has implemented the new benchmark on one of its forthcoming Stratix FPGAs and has reported preliminary results to BDTI. Similarly, Motorola Inc. has provided preliminary results for one of its high-end DSPs, the 300-MHz MSC8101 (based on the StarCore SC140 core). Looking at these results, it appears that even a midrange DSP-enhanced FPGA from Altera's Stratix line will be able to handle more than an order-of-magnitude more channels than the MSC8101 for a similar projected per-chip price. Further evaluation is in progress, but these early results suggest that the new breed of FPGAs may be a very attractive solution for some DSP-oriented applications.
Although benchmark results are important, there are many "soft" considerations that are of equal importance when choosing between an FPGA and a DSP processor.
One of these is the availability of relevant staff expertise. For example, most DSP application developers are not familiar with the design flow of FPGAs. Implementing even a simple FIR filter on an FPGA requires a totally different design process (and mind-set) than implementing the same function on a DSP processor. Altera and Xilinx offer tools and libraries to help simplify the process, but there will be a formidable learning curve for engineers who are primarily accustomed to working with processors. In addition, the time required to develop an optimized implementation of even a relatively modest DSP function for an FPGA can be orders-of-magnitude longer than that required to crank out the code for a DSP.
Optimal FFT
For example, one source told BDTI that it can take six man-months to develop an optimal fast Fourier transform (FFT) implementation for an FPGA, compared with our own experience of needing about a week to write and optimize FFT code for a high-end DSP. Altera's and Xilinx's libraries of functions help address this issue-but often, the function that's needed isn't exactly what's in the library or isn't in the library at all.
In the coming months, BDTI will continue to gather more data and refine its FPGA benchmarking methodology. With FPGAs fast becoming credible competitors to DSP processors, the question of which one is best for an application is becoming more important-and more interesting. As is always the case with benchmarking, the numbers alone are only a piece of the answer.
See related chart