DSP Buyers Guide
HTML format
PDF format
Vendor List
While DSPs have become common tools in many comm-developers bag of tricks, selecting the right DSP for a given project is no simple matter. The increasing number of DSP product offerings has brought with it a wider
variety of DSP architectures complicating the product selection task.
By Henry Davis
Digital signal processors (DSPs) have become pervasive in modern electronic design. Where once only a select group of engineers had the necessary skills to apply this advanced technology, today tens of thousands of hardware and software professionals have the knowledge needed to make digital signal processing a part of their companies product plans. These skills have been acquired
after decades of seminars and workshops offered by the major DSP semiconductor suppliers and an ever-growing cadre of independent training companies. Factory-sponsored workshops originally offered generic DSP training with a modest amount of company-specific information. Now, many of these free or relatively low cost workshops are heavily applications-focused with significant product-related content. This shift in workshop content reflects not only the dramatically increased financial importance of digital
signal processing to the semiconductor industry, but also the tremendous increase in DSP product complexity. It is no longer practical to only learn the basics of digital signal processing algorithms, and then learn the architecture of high performance DSPs on the job. The learning curve for engineers using an advanced DSP for the first time is significant and can require more effort than that demanded by the more complex general-purpose microprocessor.
Ruthless, experienced programmers with
digital signal processing knowledge used to be able to learn first-generation DSP tricks in a few months. But the increased complexity of todays high-performance DSPs may demand more than 6 months of hands-on practice before those same programmers are proficient. This increased requirement for experience with the processor is due, in large part, to the fact that the underlying architectures are different from those taught in academic courses. With a few exceptions, programmers dont usually
get the opportunity to experience these alternative architectures until they begin work on an advanced DSP project as an employee. Experience-for-hire now plays a bigger role in bootstrapping engineers than it has in the past. A small but important number of former factory applications engineers offer consultation services as independent professionals, bringing both engineering capacity and capability to their clients. For some companies, choosing a specific, advanced DSP depends on the availability of
experts to assist during the first design.
As fast as digital signal processing has grown in use, a vast untapped market of engineering professionals who have related skills but lack the experience of developing digital signal processing-based products remains. One of the major successes of digital signal processing has been the strong identification of the DSP as a processor type. This has given engineers a focus and definition to work within as they gather critical skills required for implementing new
and more complex systems. While the identification of the DSP as a processor type has benefited many, it serves to obscure the underlying technology digital signal processing is about algorithms and processes that can be applied to specific types of real-world data. Digital signal processing can be performed by any processor its just a matter of bandwidth capabilities.
Defining bandwidth requirements
Developers of DSP-based integrated circuits once pushed new
product plans ahead by adhering to a simple philosophy: Theres never enough digital signal processing power. Developers could be reasonably assured a successful product by simply turning the crank to create ever faster processors. As quickly as developers could create faster DSPs, engineers would clamor for more performance. As standards-based products like cell phones, modems, video codecs, and audio processors begin to mature and become mass produced, DSP capabilities can be grouped into
applications-related performance bands. For example, the first-generation GSM voice codec could be implemented in less than 10 million DSP instructions per second (DSP MIPS). The half-rate GSM coder required less than 33 DSP MIPS. These two applications create natural partitions in the performance spectrum. Mass-produced DSPs for full-rate GSM must meet the requirements for performance, but should not exceed the requirements by a substantial margin. The first application-based partitioning of performance
requirements has continued to be a factor for developing new DSPs.
Even though DSP performance can be defined by application requirements that limit the need for ultimate performance, application and system designs still remain that demand the maximum performance possible.
Competing architectures
Dichotomies between performance potential, application needs, speed, power, price, and capability all serve to open the field of DSP architecture to a wider group of suppliers. With this
increase in the number of suppliers comes a greater diversity of architectures. Each company brings its past experience to bear on a DSP architecture. As the company gains experience in DSP-based solutions, its product offerings will migrate. Texas Instruments (TIs) TMS320C5xx families illustrate this trend. The original TI DSP designed in 1982 was a modified Harvard architecture that physically and logically split program and data memory into separate addressing spaces. The modified portion of
the architecture refers to the addition of a bus interchange module that permits limited data exchanges between the program and data memories. Pure Harvard architectures maintain a complete separation of the two spaces. For the TMS32010 DSP, the ability to transfer data values from program to data memory is critical. Program memory for mass-produced parts is ROM, just as TI has employed in their TMS1000 4-bit microcontroller. Since coefficients do not change, the obvious place to store them is in the ROM
space. This and other considerations, led to departures from the pure Harvard philosophy.
The TMS32010 led to the second-generation TI DSP the TMS320C25, a part still in production. The C25 added some instructions and modified the peripherals based on experience with general-purpose DSP applications. The next change in the basic TI architecture was fueled by a short-lived relationship with Intel. As part of an ASIC agreement between TI and Intel, TI undertook a design program that would
permit both companies to standardize on the C25s architecture. TIs experience in developing customized DSPs for large customers prompted them to restructure the physical layout of the C25 to include a specialized peripheral bus and full JTAG for testing. Intels experience in the controller market led them to lobby successfully for the inclusion of bit-manipulation instructions to the DSP instruction set. These two different experiences from TI and Intel shaped the TMS320C50
into a more capable general-purpose part, while maintaining a significant source code compatibility with previous generations of the architecture.
Success breeds specialization
According to Will Strauss, president of Tempe, AZ-based Forward Concepts, since he began publishing his DSP market and strategy study over a decade ago, TI has maintained the lead in DSP pro-duct sales. During the early 1990s, TIs largest applications involved motor control for hard disk drives and
for data modems. Both of these applications demanded variations of the standard product offering, to improve performance and reduce the cost of the final products electronics portion. Coupled with the incredible demand for voice codecs for European GSM, the needs for specialization drove differentiation in the TI offering. The C5x spawned two additional part families: the TMS320C2xx and TMS320C54x. The C2xx is focused on motor control and similar applications, while the C54x is
targeted on GSM and other digital cellular standards-based applications. Both of these families continue to evolve with higher performance parts, new peripherals, and larger memory sizes, announced with some regularity. The variations and alternatives within each family are extensive, and the programmers control over such features as memory maps creates hundreds of in- system variants. The C54x family now includes twenty-six different standard part types, including the TMS320UC5402 fixed-point DSP,
which is aimed at low-power, high-performance applications. The C5402 features low power consumption and the flexibility to support different system voltage configurations commonly found in battery powered applications. The wide range of I/O voltage enables it to operate with a single 1.8-V power supply or with dual power supplies for mixed voltage systems. This feature eliminates the need for external level-shifting and reduces power consumption in systems below 3V. The part includes three separate
16-bit data memory buses and one program memory bus to optimize memory access during multifunction instruction execution. On-chip peripherals include: software-programmable wait-state generator; programmable bank switching; on-chip phase-locked loop (PLL) clock generator with internal oscillator or external clock source; two multichannel buffered serial ports (McBSPs); an enhanced 8-bit parallel host- port interface (HPI8); two 16-bit timers; a 6-channel direct memory access (DMA) controller; power
consumption control with IDLE1, IDLE2, and IDLE3 instructions with power-down modes; CLKOUT off control to disable CLKOUT; on-chip scan-based emulation logic; and IEEE Std 1149.1 (JTAG) boundary scan logic. The DSP runs at nearly one hundred times the speed of the original TI TMS32010, achieving 80 MIPS, with each instruction consuming just one clock period. The ultimate application in portable battery- powered products dictates thin, space-saving packaging. The C5402 is available in a 144-pin TQFP and
144-pin BGA.
The other offshoot in the C25 lineage is the C24x and C240x part families, including a series of DSP cores employing the same instruction set. The C24x family includes eleven part types. The TMS320LF240x and TMS320LC240x devices are based on the TMS320C2xx generation of fixed-point DSPs. The 240x devices offer the enhanced TMS320 architectural design of the C2xx core CPU. Several advanced peripherals, optimized for digital motor and motion control
applications, have been integrated to provide a single-chip DSP controller. While code-compatible with C24x DSP controller devices, the C240x offers increased processing performance (30 MIPS) and a higher level of peripheral integration.
The C240x family offers a selection of memory sizes and different peripherals tailored to meet specific price/performance points defined by mass production applications. Flash-based devices of up to 32k words provide reprogrammable solutions useful
for applications requiring field programmability upgrades. In addition, Flash memory serves the needs for development and initial prototyping of applications that migrate to ROM-based devices in production. The Flash devices and corresponding ROM devices are pin-to-pin compatible.
The total signal chain
TI was not the only DSP company to follow the path towards applications specificity. Analog Devices, Inc. (ADI), once a secondary player in the DSP business, began to shift its
focus on the digital communications market beginning in 1989 almost to the exclusion of all other applications, during the initial years of the companys microcomputer DSPs. Like TIs migration from its earliest DSPs, ADI took an application focus towards its developments. But ADI made two key strategic decisions: future products would retain upward compatibility with the original microprocessor DSP (the ADSP2100), and the company would look at the entire signal chain, not just the digital
portion. Unlike other DSPs offered in the late 1980s, the ADI 2100 was a true microprocessor designed specifically for DSP applications. There were no peripherals or memory onboard. The decision to consider the entire signal chain gave rise to ADIs mixed-signal processors for digital cellular and other voice applications. ADIs most recent fixed-point products have retained upward source compatibility to some degree with the original ADSP2100, but use superscalar implementation to improve
performance. To benefit from the improvements present in the TigerSHARC, legacy programs must be rewritten to exploit the added capabilities. The TigerSHARC is a static superscalar architecture, and incorporates many aspects of conventional superscalar processors, including a load/store architecture, branch prediction, and a large interlocked register file. The term static is applied because instruction-level parallelism is determined prior to run-time and encoded in the program. All the registers are
interlocked, supporting a simple programming model that is independent of implementation latencies and is fully interruptible. Branch prediction is supported by a 128-bit entry branch target buffer (BTB) that reduces branch latency. Program code is stored in quad-word memory with no wasted space. The product can execute eight 16-bit multiply and accumulates (MACs) per cycle with 40-bit accumulation, two 32-bit MACs per cycle with 80-bit accumulation, or two 16-bit complex MACs per cycle.
Reflecting
ADIs experience in serving the communication segment, TigerSHARC includes a single-cycle add, compare, and select (ACS) sequence in the Viterbi algorithm, an add-subtract instruction and bit reversal in hardware for FFTs, and a 64-bit generalized bit-manipulation unit.
The TigerSHARC processes 8-, 16-, and 32-bit data as native types. This allows the processor to scale the number of operations that can be completed in a cycle, based on the length of the data type being processed. Each of the
two computation blocks (CBX and CBY) contains a multiplier, an arithmetic and logic unit (ALU), and a 64-bit shifter. With these resources, a single cycle supports execution of eight 40-bit MACs on 16-bit data, two 40-bit MACs on 16-bit complex data, or two 80-bit MACs on 32-bit data. With 8-bit data types, the architecture can scale performance to issue sixteen operations in one cycle, executing 8 billion operations per second. The TigerSHARC features a short-vector memory architecture organized in three
128-bit wide banks. Quad (128-bit), long (64-bit), and normal (32-bit) word accesses move data from the memory banks to the register files for operations. Four 32-bit instruction words can be fetched, and 256-bits of data can be loaded to the register files or stored into memory in a single cycle. The memory architecture can store 8-, 16-, and 32-bit data in contiguous, packed memory. Internal and external memories are organized in a unified memory map, and the partition between program memory and data
memory is user-determined.
The computational resources are controlled by a sequencer that can issue up to four 32-bit instructions in parallel. One or two of these instructions can control more than one computational unit, reducing code size and power consumption. Programmers can control how individual instructions to each of the computation units are issued.
Joining forces
Developing DSP architectures is a time-consuming, expensive process that few companies can afford. Even
though they are large enough to fund separate development efforts, Motorola and Lucent have joined forces to develop new DSP architectures. The joint venture group called StarCore has already specified the first core: the StarCore SC140. This DSP core includes four MACs and executes more than one billion MACs per second. The StarCore alliance pools some of the industrys most experienced and expert DSP engineers, including Jim Boddie one of the most experienced DSP architects in the industry.
Boddie has been a key figure in nearly every Lucent DSP developed, including those developed twenty years ago for internal use.
The SC140 is a 16-bit DSP core, initially available at a clock speed of 300 MHz and an operating voltage range of 0.9 to 1.5V. Faster clock speeds and lower-voltage versions are planned for the future.
The core includes twelve data execution units, which consists of four MACs, four general-purpose arithmetic ALUs, and four bit field units (BFUs). Each MAC unit
can execute a 16 x 16-bit fractional or integer multiplication, and can then add the result to a 40-bit accumulator in a single clock cycle. The ALUs perform general calculations such as adds, subtracts, compares, and maximum value operations. The BFUs perform bit field functions. Each BFU incorporates a 40-bit barrel shifter to speed such operations as multibit shifts, bit rotations, and inserts useful in communication processing. The integration of four such barrel shifters on a single DSP core is unique
and contributes to the SC140s execution of communication algorithms.
The cores program control unit includes a program sequencer, which fetches instructions and performs loop and branch control. The SC140 has a five-stage pipeline consisting of program pre-fetch, program fetch, dispatch/decode, address generation, and execute. This is a relatively short pipeline by current DSP standards. The shorter pipeline simplifies assem-bly language programming and improves hardware branch and
interrupt handling by reducing the number of conditions that programmers and hardware must consider. Up to eight 16-bit data words may be fetched at once using two 64-bit data buses, for a total bandwidth of 4.8 Gbps. The program data bus is 128 bits wide, allowing up to two prefixes and six instructions to be fetched per cycle.
While TI and ADI have demonstrated their ability to engineer a family of parts and architecture enhancements, the StarCore promise is still too new to evaluate completely.
Both partners are capable of creating an ongoing series of cores and parts to meet future applications needs, and both have stated that they will develop products based on StarCore technologies. With the first products planned for the year 2000, StarCore partners promise to continue developments to meet their own market needs.
The exotic architectures
Harvard architectures and superscalar implementation techniques are two of the better understood technologies used to create advanced
DSPs. But as engineers have demanded greater levels of performance, DSP designers have resorted to a wide variety of more exotic architectures and implementation techniques.
TI was the first mainstream DSP company to embrace the the very long instruction word (VLIW) architecture as a mechanism to improve performance. VLIW works by using an instruction word large enough to hold several basic instructions. TI adopted an 8-way VLIW architecture in which up to eight basic operations may be specified
by programmers every single cycle. Tremendous flexibility and high performance are the advantages of this type of architecture. The drawback is that it is extremely difficult for all but the most expert programmers to develop memory- and performance-optimized code. Instead, special optimizing software determines an acceptable schedule for the individual instructions and rearranges them to achieve the best performance (with a few restrictions). The TI VLIW DSPs (TMS320C6x and TMS320C67x floating-point parts)
may be the first DSPs to force mere mortal DSP programmers to use C during development.
Where TIs C6x family relies on VLIW as the basic instruction format, Infineon (formerly Siemens Semiconductor) observed that most DSP programs consist of a small amount of DSP-specific time critical code coupled with much larger control and general purpose code. To save code size and simplify programming, Infineons Carmel DSP combines a normal CISC with the added ability to trap to a VLIW
program memory space. High-performance digital signal processing code or specialized instructions are contained in a 1k- word configurable long instruction word (CLIW) memory space, while compact code is stored in ordinary program memory. The Carmel core uses the CLIW reference in ordinary memory to select the CLIW code fragment to be executed without any branching overhead.
Whether the core uses VLIW or CLIW, the use of software branch interlocks places an extra burden on programmers and
development tools. The higher performance comes at the expense of needing to consider pipeline effects on individual instructions. Ruthless programmers can produce amazingly small and efficient code. Mere mortals may have their hands full in keeping track of the warning messages produced by the development tools.
Choosing a DSP
Many factors go into choosing a specific DSP. Company reputation, support, price, availability, and quality are just a few of the factors considered when choosing a
complex product like a DSP, and there are more to choose from than those mentioned in this article. Please see the product table and vendor list for a complete listing of DSPs available on the open market (the product table can be viewed by visiting www.csdmag.com). Two of the DSP industrys engineering godfathers offer hard-won advice. Robert Owen, a long-time consultant in Saratoga, CA (and a member of the team that developed Intels original 2920 DSP two decades ago) observed that it is
necessary to make sure that the part is a real thing. DSPs are hard to develop, and you can get lost in the architecture subtleties. But the question is: Which ones will become living breathing products? Auburn, CA-based consultant, Richard Blasco (designer of the first commercial DSP, AMIs 2811 SPP) offers this sage piece of advice: Choose a DSP that has enough performance, but not too much. Remember that digital signal processing can be done on any processor its all a
matter of bandwidth. With the hundreds of digital signal processing-based processors available, nearly any need can be met with a near-perfect technical match. Now that more than half of the semiconductor companies offer some type of DSP technology, finding a company that fits your business and technology plans is only a matter of searching.
Henry Davis is president of Henry Davis Consulting, a new products consultancy based in Soquel, CA. Davis is a contributing editor for Communication
Systems Design. He holds a BSin computer science and business administration from Columbia Pacific University, and has done graduate work at the New Mexico Institute of
Mining and Technology. He can be reached
at
hdavis@ix.netcom.com
.
Return to the
Table of Contents