Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec


















Audio Designline



eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Green SupplyLine
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • RFID-World
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


11 October 2008

DSP System Integration

As expectations for wireless and portable systems continue to grow in magnitude, the processors that drive these applications must also become more powerful, while simultaneously using less power. One method of increasing overall system performance is to integrate as much functionality as possible on a single DSP.

By Ethan Bordeaux

Lately, there has been a lot of talk about novel digital signal processor (DSP) architectures for portable applications. While the processor’s architecture certainly helps to determine if the processor will be well-suited for a specific application, a number of other concerns must also be considered before the proper processor can be selected. One such item is the level of on-chip system integration provided.

System integration is the placing of circuitry or subsystems on a single chip that would otherwise be located external to the DSP. Examples of system integration include:

  • Memory (SRAM, ROM, and Flash)

  • Analog/mixed-signal components (analog-to-digital [A/D] or digital-to-analog [D/A] converters, comparators, and current switches)

  • Coprocessor circuitry (hardware accelerators).

While it is not always possible to find a DSP that offers all of the necessary functionality on a chip, some reasons to look for a highly integrated DSP include:

  • Lower system power consumption
  • Lower system cost
  • Faster, more efficient system operation
  • Shorter design time.

This article will examine each of these four points individually to understand the importance of on-chip integration. Specifically, these points will be applied to the integration of on-chip SRAM and how it is pertinent to portable and power-sensitive applications. This example is chosen for numerous reasons:

  • The need for memory in wireless and portable applications is growing at a dramatic rate. A 32% increase in the memory market for wireless applications is projected over the next five years. This number outpaces the accepted value for the rest of the integrated circuit (IC) industry in the wireless market (17%). 1

  • Integration of on-chip SRAM has a direct effect on all of the reasons mentioned previously for choosing a highly integrated DSP. Items such as A/D and D/A converters improve system performance in some areas (power consumption and board space), but not in others (operational speed).

  • Fewer trade-offs occur with SRAM integration than with other DSP peripherals. For example, when placing Flash memory on a DSP, the Flash memory oftentimes creates an operational-speed bottleneck, forcing the entire processor to be clocked slower than if it did not have the memory on chip. This is in contrast to on-chip SRAM, where the address decoding logic is typically much faster than the core operational speed and, therefore, does not create a speed bottleneck.

Lower power consumption

Obviously, ensuring the lowest possible power consumption is very important in portable and power-sensitive applications. What is really gained, however, by using a DSP with all the necessary SRAM moved on-chip? One way to examine this question is to look at the added power consumption resulting from the external memory accesses. The formula for determining the power consumption of a switching line is:

P switch = C x V switch 2 x f (1)

where:

C = load capacitance

V switch = low/high switching voltage

f = switching frequency.

The load capacitance is determined by the types of packages used for the external memory and the DSP. Typical values for pin capacitance are around 5 to 10 pf. The total load capacitance is equal to the output capacitance on the driving IC and the input capacitance on the receiving IC.

The following is an example of the added power consumption for a specific scenario (see Table 1 below):



TABLE 1: Power Calculation for External Memory Accesses.
Pins # of Pins C V switch f P tot
Address/ data, memory,
and read select
18 16 pf 3.3 V 50 Mhz 157 mW



  • The DSP has sixteen address lines and sixteen data lines, along with read/write select pins and a chip select.


  • Memory reads are made every cycle, with an average of 12 the address and data lines switching every cycle, with additional memory and read select pins. Therefore, this example can be modeled as a DSP having eighteen lines that switch every cycle.


  • Load capacitance is 8 pF for the DSP and 8 pF for the external SRAM (total load capacitance is 16 pF).


  • V switch is 3.3V.


  • DSP operating speed is 50 MHz.

The power required to perform these memory accesses (157mW) is greater than the total power consumption of many modern DSPs. If all of the required memory were integrated on chip, these memory accesses would be free. There would be no additional power consumption due to switching lines because no lines would be switching. Clearly, integrating SRAM on chip can have a strong positive impact on the amount of power consumed in a DSP system.

Lower system cost

Two key factors work to lower system cost: reduced board space requirements and lower bill-of-material (BOM) costs due to increased integration.

Even with the advent of high-density package types such as the miniBGA, the additional space taken up by external memory can be quite significant. For example, the package for a 32-kbyte external memory IC can be as large as 100 mm 2 — the same size as some of the smallest DSPs available today. However, the area of the silicon die itself (inside the package) is much smaller. In deep submicron processes, hundreds of kbits of SRAM can be placed in a single mm 2 , which means that only a small percentage of the total area of a typical memory IC contains memory transistor cells. Because memory cells are relatively small compared to the total package size of a DSP, it is possible to manufacture a series of DSPs with different memory sizes but which can all fit into the same package size. In terms of board real estate, the memory is free. Oftentimes it is possible to add memory to an existing DSP without increasing the total size of the device.

The bottom line in expanding the total layout dimensions in a system is that it will increase system cost. As an example, typical prices for printed circuit boards (PCBs) are $0.10/in 2 per layer. Just taking into consideration the additional area that is devoted to holding a typical memory IC, the added cost is around $0.10 on a six-layer board. However, using multiple-byte wide memory ICs is often necessary to handle the native word width of a particular processor family. This estimate also ignores the added space required for routing all of the address and data pins. The total board area required can easily be many times greater than that needed for the IC itself.

Memory integration cost savings

A single-chip solution usually requires less expenditure when compared to a DSP-plus-external memory (see Figure 1 ).

Essentially, choosing a DSP with all of the memory on chip proves to be more cost effective. One major reason for this cost savings is that a great deal of the manufacturer’s cost goes into purchasing the package that holds the silicon die. Because a DSP manufacturer is only purchasing a single package (the one that holds the DSP-plus-memory), the DSP-plus-memory cost can often be lower.

Faster system operation

Forcing a DSP to use external memory can slow overall system performance for multiple reasons. These are based on external DSP bus architectures, digital signal processing algorithms, and processor operating speeds.

The internal bus architecture of a DSP has at least two address buses and two data buses (or a minimum of two address/data buses per multiply-and-accumulate [MAC] computational unit). This architecture is chosen because DSPs are most commonly used in applications where the algorithms fetch two pieces of data while simultaneously performing a mathematical calculation. An example of this is the classic sum-of-products operation:

y(n) = h k (n) x (n - k) (2)

where:

k = 0, 1, 2…, n - 1

This equation demands that a processor be capable of fetching two pieces of data (h k (n) and x(n - k)) , while simultaneously performing a MAC instruction. Most DSPs can easily support this instruction when operating from on-chip memory. However, when internal memory runs short, making it necessary to perform external accesses, difficulties can arise. In many DSPs, the address and data buses for program and data memory are multiplexed on the external bus to reduce pin count and lower package costs. As a result, only one piece of information can be accessed externally in a single cycle. In a worst-case situation (one access for the DSP opcode and two accesses for separate pieces of data), a two-cycle stall can occur for each instruction.

External memory timing requirements

DSP execution speeds have increased dramatically due to deeper pipelined architectures and smaller geometry processes. This increase can be a good thing if all code/data are on chip, but if information must reside external to the DSP, additional stalls can take place due to minimum memory access time requirements. Processors with operational speeds above 50 MHz are particularly prone to stalls. The crucial timing requirement in an external memory access is the time between the assertion of the address/memory strobes and the time at which the data is valid on the external bus. On many DSPs operating above 50 MHz, this value becomes prohibitively small or even negative, making zero-wait-state accesses impossible. If a single wait state is needed, effective operational speed is cut in half. If this information was combined with a knowledge of digital signal processing algorithms and external buses, it’s easy to see how, in some circumstances, the execution time could be six times slower when working out of external memory (three memory accesses and a single wait state for each access).

Note that DSPs have a much more difficult time operating efficiently out of external memory than microcontrollers. In the microcontroller world, placing a great deal of code in external memory is common practice for several reasons:


  • Microcontrollers operate at much lower clock speeds than DSPs, allowing single-cycle execution when operating out of external memory.
  • Microcontroller code often consists of test and branch instructions, which typically do not require the retrieval of multiple pieces of data every cycle.
  • Microcontroller applications tend to involve much more code than DSP applications (Mbyte versus kbyte). Because of the large amounts of code, manufacturing a microcontroller with all of the memory internal to the chip would be unreasonable.
  • The external memory interface of a microcontroller is often exactly the same as its internal memory structure — a single unified (Von Neumann) memory space. DSPs have an external memory bottleneck because they are generally forced to go from a Harvard-inspired memory architecture to a Von Neumann memory architecture.

For these reasons, many of the engineering rules that could be followed in system designs that revolved around a microcontroller can no longer be applied to DSP system designs.

Shorter design time

By integrating all of the necessary SRAM on chip, the design time can be shortened. Two difficulties that integration circumvents are additive noise issues due to quickly switching lines, and the challenges in communicating efficiently with external memory.

Noise introduced into systems due to switching lines can be a major problem, especially in portable wireless systems. By keeping processor functions (such as memory accesses) on chip, disturbances that might affect system operation can be minimized. One element of disturbance found in digital systems is crosstalk due to inductive coupling. Crosstalk is primarily caused in electrical circuits by quickly changing loop currents that induce magnetic fields (which interact with other circuit loops). This is known as mutual inductance (see Figure 2 ).

A portion of the magnetic field caused by the changing current in path A passes through path B — inducing a voltage in path B. But how much of a problem is mutual inductance in typical circuits? While estimating the severity of mutual inductance can be difficult, even the most careful system design will lead to some noticeable level of crosstalk. By adding address and data buses for external memory accesses, dozens of additional current loops are added to a board layout — potentially increasing any noise problems already present in the design. One way of looking at how this noise affects system performance is by examining the idea of the noise margin.

The noise margin

Crosstalk directly leads to a change in the voltage levels that are transmitted by an IC. Whether or not this additional noise will affect how voltage levels are translated can be determined by looking at the noise margin for a specific logic family. Each logic family has two noise margin values: low and high. They are defined as:

NM l = V i l (max) – V o l (max) (3)

NM h = V oh (min) – V i h (min) (4)

The noise margin explicitly states how much noise a signal can have on it before it is in danger of being misinterpreted by a particular logic family. For example, the noise margins for the low-voltage TTL (LVTTL) family are:

NM l = 0.8V – 0.4V = 0.4V (5)

NM h = 2.4V – 2.0V = 0.4V (6)

If more than 0.4V of noise becomes coupled into an LVTTL signal, a transmitted logic-low or logic-high signal may not meet the minimum specifications for an LVTTL logic level.

This situation will continue to get worse. As process geometries continue to shrink, the voltage level that ICs operate on will similarly drop, leading to smaller low/high voltage swings and potentially smaller noise margins. This degradation in the noise margin can be seen in the current EIA/JEDEC Standard for 2.5-V systems, where NM l is equal to 0.3V (at -1mA), and the NM h is equal to 0.3V (at 1mA). Faster low/high switching times present in smaller geometry processes are additional concerns. As previously stated, the level of mutual inductance between adjacent lines is related to the rate of change of current — the faster the change in current, the stronger the mutual inductance. As luck would have it, the maximum change in current passing through a capacitive load is proportional to the square of the rise/fall time. Therefore, if the rise time is cut in half, the amount of mutual inductance is quadrupled.

The moral of this story is that it’s often best to keep as much functionality on chip, thereby avoiding troubles attributed to modern, fast-switching systems.

Software design issues

When a DSP accesses external memory, it loses much of its potential memory bandwidth and is forced to struggle with (at best) a single memory access per cycle. As a result, software engineers will often try to place the microcontroller-like instructions (instructions that do not require multiple data accesses in parallel with opcode accesses) into external memory. Also, some DSP tool suites allow the user to create software overlays, whereby the DSP loads in the necessary code at any particular instant during run time as a background direct memory access (DMA). While both of these methods can be effective in certain circumstances, they require extra effort in design and debug from the programmer, which directly translates into longer time-to-market.

There are a number of additional system-level concerns for both hardware and software designers when external memory is added to a system. Why bother fighting through these issues if a processor exists that contains all of the SRAM your design will ever need?

Keep it on chip

From all of the above points it can be seen just how important having the right amount of on-chip memory can be in making the design process move more smoothly. Not underestimating what the impact will be in choosing a DSP that operates out of external memory is crucial. With the amount of SRAM available on many of today’s DSPs, there’s no reason not to choose such a highly integrated processor.

Ethan Bordeaux is a DSP applications engineer at Analog Devices, Inc. He graduated from Tufts University with a BSEE and can be contacted at Ethan.Bordeaux@analog.com .

References Illustrations

  1. Cahners In-Stat Group, “Cellular/PCS Handsets: The Scramble for Stacked Memory, SiGe, Multi-Mode & Internet Access,” Newton, MA, 1999.

  2. 2 “2.5V ±0.2V (Normal Range), and 1.8V to 2.7V (Wide Range) Power Supply Voltage and Interface Standard for Nonterminated Digital Integrated Circuits,” EIA/JESD8-5, 1995.

Figure 1
Figure 2

Return to the Table of Contents





Virtualab

  • U.S. 'smart lighting' effort targets LED-based wireless nets
  • IMEC researchers embed optical links in flexible substrate
  • Analysts cut 2009 cellphone growth estimates
  • Optical material could enable universal laser
  • MORE
    Prototype fuel cell for handsets eyes fivefold run-time boost
    As part of a research collaboration on miniaturized energy sources, the French Atomic Energy Agency (CEA) and STMicroelectronics NV (Geneva) have prototyped a hydrogen fuel cell for mobile phones that aims to reduce dependency on the use of electrical power supplies to recharge batteries. EE Times' Anne-Francoise Pele Takes a closer look.Click here to learn more.

    Tech Article Library
    Check out CommsDesign's Design corner to find a detail technical articles on a host of communication design issues. To access the design corner, click here.

    Phyworks demos 10G copper interconnects
    Communications chip specialist Phyworks (Bristol, England) has demonstrated 10Gbits/s rack-to-rack copper interconnects of up to 30 metres using technology it originally developed for the optical module market. EE Times Europe's John Walko gets the story. Click here for details.

    Puzzled by a network processing design issue?

    Join former NPF CEO Colin Mick in discussing net processing design issues by clicking here!


    EE Times TechCareers
    Search Jobs

    Enter Keyword(s):


    Function:


    State:
      

    Post Your Resume
    -----------------
    Employers Area
    Most Recent Posts More career-related news, resources and job postings for technology professionals




    Home  |  Register  |  About  |  Feedback  |  Contact   |  Site Map