Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec




















eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Green SupplyLine
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • RFID-World
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


11 March 2010

Feature

Hardware Implementations of Multirate Digital Filters


It’s important to map interpolation and decimation functions into hardware efficiently. The challenge is in choosing the right hardware types. Here’s a look at DSP, PLD, and ASIC implementations for multirate filters.

By Tony San

Many communication systems require multirate filters. A multirate filter is a filter in which the output data rate and the input data rate are not equal. This often occurs near a physical interface such as a digital-to-analog converter (DAC) or an analog-to-digital converter (ADC). When the filter is outputting to a DAC, the end user usually wants an interpolation filter, which generates more data points to create a smoother waveform. When the filter is receiving information from an ADC, the end user generally wants a decimation filter. The decimation filter allows data to be oversampled and facilitates a higher signal-to-noise ratio (SNR). By incorporating a decimation filter, the system only needs to operate at the information rate.


Interpolation and decimation

Interpolation is used to increase the output sample rate (see Figure 1 ). It’s necessary to generate new sample points that are located between the original sample values. Because the value of the new sample points is unknown, their values are set to zero. This is called upsampling, inserting zeros, or zero stuffing. Since there are more data points at the output, the sample rate has changed, pushing out the Nyquist frequency. Inserting zeros into the data in the time domain has an interesting effect in the frequency domain; it creates reflections of the original spectrum. There cannot be more information present than in the original sample rate, and, therefore, all the reflections (which have been artificially put into the system) are noise. Fortunately, the noise can be removed by applying an ideal low-pass filter.

A decimation filter works in a similar manner. In this case, sample points are removed, which decreases the sample rate and reduces the Nyquist frequency. Any frequencies outside this reduced Nyquist frequency will be aliased back around and will appear as noise. It is necessary to apply a low-pass filter before removing the data (downsampling) to assure that noise is not introduced into the system (see Figure 2 ).


Implementation strategies

A desirable characteristic for low-pass filters (employed in both interpolation and decimation) is linear phase. In practice, linear phase filters are implemented using finite impulse response (FIR) filters. FIR filters are computationally more expensive to implement than infinite impulse response (IIR) filters. To get better performance, it is necessary to increase the order of the filter. FIR filters range from tenth order to 200th order and beyond. For every output there must be anywhere between ten to 200 calculations. Since FIR filters are so computationally expensive, designers often use dedicated hardware to perform this function. The dedicated hardware may come in the form of a dedicated filtering chip, a programmable logic solution, or a semicustom (standard cell implementation) integrated circuit.

A standard cell implementation of an FIR function will generate the highest possible throughput. Programmable logic devices (PLDs) and dedicated filter chips come next in terms of speed, followed by general-purpose DSPs.

When the end user requires a high data throughput, nothing beats custom hardware like ASICs or PLDs. While the ASIC design flow is well understood, it can be a long and challenging process. In the case of PLDs, automated tools that generate FIR filters are available, speeding up your development flow. For interpolation and decimation filters, it is possible to employ certain techniques in order to decrease area and increase performance.

When designing a multirate filter, there is no best implementation; there are many ways to evaluate a solution. The cost of a solution is determined by the required performance for an implementation. The performance is defined as the total number of multiplications required per second. (Because additions can usually be combined with multiplications, they will not be included in the computational expense.) Assuming a single multiplication requires a single clock cycle, the MIPS required to implement a solution can be determined.

There is a straightforward approach for examining the computation rate of an interpolation filter. First, the data is upsampled, then it is filtered. The example shown in Figure 3 requires a 388-tap filter that must operate at a data rate of 12 megasamples per second (MSPS). The required computation rate for this implementation is approximately 4,500 MIPS.


Multistage filtering

Fortunately, there are ways to decrease the required computation rate. It is possible to interpolate by 12 in three stages. At the first stage, the designer can interpolate by a factor of two. The output of the first interpolation stage is then further interpolated by a factor of two, and the output of the third stage would be interpolated by a factor of three. There are now three filters. Figure 3 illustrates the specifications for the individual filters.

By interpolating in stages, the individual filter requirements have been relaxed, which reduces the order of the filter required. Furthermore, the first two filters are operating at two and four MSPS. Only the last filter is operating at 12 MSPS. In the original approach, the entire filter was operating at 12 MSPS. Similarly, when you decimate in stages, the required computation rate often decreases.

The multistage filtering approach has reduced the computation rate down to 1,035 MIPS. By redistributing the computation across multiple stages, the required result is obtained with much smaller filters. This method of optimization is a relatively high-level approach.


Polyphase decomposition

Another strategy for decreasing the computation rate involves looking at the details of implementing an interpolation filter. Because it is known that zeros are inserted and then filtering is performed, it is possible to break the problem up into several shorter filters. Each of the filters would operate at a different point in time. This is known as polyphase decomposition, as shown in Figure 4 .

A simple case can explain how the polyphase interpolator works. In this example, there is a filter with 24 coefficients, which interpolates by 4. Since the filter interpolates by 4, most of the data input to the filter is actually zero. The coefficients with zero data can be removed when performing a particular multiplication. For instance, the first output would be determined solely by coefficients C 0 , C 4 , C 8 , … C 20 . The next output would be determined by coefficients C 1 , C 5 , C 9 , … C 21 . In this case, there are only six multiplications required per output instead of 24. We have reduced the computation rate by the interpolation factor. In the case of a 388-tap filter that interpolates by 12, each output could be determined from only 33 multiplications. A polyphase interpolator would require 388 MIPS to perform the same computation.

These same techniques can be applied to decimation structures as well. In this case, by decimating by a factor of 4, three out of every four data points are thrown away after filtering. It’s unnecessary to calculate the data points being thrown away. The polyphase decimator distributes (for a decimation factor of 4) the data across four shorter polyphase filters. Finally, the outputs of the four filters are added together to obtain the final result. Each of the four polyphase filters needs to produce an output at the decimated data rate, reducing the performance requirements for the decimator.

Naturally, it is possible to combine the approaches to further reduce the computation rate. For instance, multistage filtering could be performed with each of the individual stages implemented in a polyphase structure.


Implementation with DSPs and cores

At an implementation level, it is necessary to come up with an architecture (which takes up a minimum amount of device resources, operates at the lowest power level, and so on) to perform the calculations. The solution will depend on the computation rate. For situations that require a few hundred MIPS, DSPs are ideal. While some DSPs are able to perform at up to 1 GOPS, a typical design uses a DSP for more than just filtering; it is necessary to carefully allocate MIPS among all the various functions that the processor is performing. In many cases, a MIPS budget will be developed and a DSP selected based on the required performance.

There are alternatives if the required performance exceeds the capabilities of a single DSP. These alternatives involve splitting the tasks across several DSPs or using hardware coprocessors to speed up even the most computationally intensive tasks. At this point, ASICs and PLDs enter the picture.


Implementation with dedicated logic

Specialized chips that perform an interpolation function are available from various semiconductor vendors. These chips contain several multipliers that perform the filtering function, obtaining a decisive performance advantage over a DSP. The chips can support a fixed number of coefficients and a particular interpolation or decimation factor.

The ASIC and PLD approach can be lumped into the “build your own dedicated hardware” category. With this approach, it’s possible to calculate an entire 127-tap FIR filter in a single clock cycle (nearly two orders of magnitude faster than a DSP). The design challenge is to be aware of everything that is going on (both in DSP, HDL simulation, synthesis, verification, testability, and fault coverage).

For a fully-parallel interpolation filter, breaking down the filter into a polyphase decomposition provides a set of multiple shorter filters. In order to obtain one filter calculation per single clock cycle, there should be one multiplier for each coefficient in the polyphase filter. At every input clock, two things happen: 1. the data will be stored in every polyphase structure, and 2. N outputs are generated by each of the filters. Finally, the output clock sweeps through all of the individual filters in the same time span as a single input clock.


Clock domains and static timing analysis

Breaking down the design, there are two clock domains: the input clock and the output clock. The output clock rate is an integer multiple of the input clock rate. The output structure (a simple multiplexer) needs to operate at a higher data rate than the input polyphase filters. When designing dedicated hardware (be it ASIC or programmable logic), reducing the number of clock domains is often desirable. With ASICs, extra clock domains need to be tied together when generating scan vectors. There may be false paths that must be removed from the static timing analysis. With programmable logic, there is a fixed number of clock signals allowed, which makes each clock domain a precious item.

It’s possible to clock the entire structure with the output clock if clock enables are placed on the flip flops used in the polyphase structures. By using clock enables, the polyphase structure only needs to run at an input clock rate (slower clock signal), and the timing on these complex structures is relaxed. This makes the polyphase structures a multicycle element, and it is necessary to perform static timing with the multicycle specification. Timing analyzers used in ASIC- and PLD-oriented design flows support multicycle specifications.

When designing an ASIC, the required number of multiplication units can be placed into silicon and the desired speed can be obtained using minimal space. ASIC implementations tend to be less flexible than DSPs and PLDs. When using a dedicated piece of silicon, changes require a complete respin (which costs time and money).


PLD structures for filtering

The PLD implementation takes a different approach. There are two structures used to perform the filtering operation in a PLD: serial and parallel. Both structures take the coefficients and efficiently map them into look-up tables to perform the multiplication. The fully parallel structure performs the entire filtering operation in a single clock cycle. The serial structure distributes the calculation across several clock cycles (as determined by the input bit width). This results in lower throughput, but serial structures are efficient in terms of silicon utilization (requiring minimum storage and logic).

Today, there are tools that automatically generate FIR filters for programmable logic. At a minimum, these tools generate single filters when given a set of coefficients. The more advanced tools generate fixed-point coefficients for the user and can produce polyphase filters based on them, along with area and speed estimates.


Evaluating solutions

There are many ways to implement logic that will perform the interpolation and decimation. The engineer has to evaluate the required throughput, come up with an efficient implementation, and balance time spent optimizing the design against completing the project quickly.


Tony San is an engineer at Altera in San Jose, CA, and has over 10 years of design experience. He received his BE and MSEE from Manhattan College. Tony can be reached at tony_san@altera.com.



Illustrations
Figure 1
Figure 2
Figure 3
Figure 4
Resources
  1. Crochiere, R.E., and Lawrence R.R., Multirate Digital Signal Processing , Prentice-Hall.
  1. Implementing FIR Filters in FLEX Devices , Altera Application Note 73.
  1. ATM Forum Technical Committee. "An Introduction to POS-PHY Level 3: A System of Interdace for Cell and Packet Transfer for OC=48 Aggregate Bandwidth Applications" (ATMF 99-0421)



Return to the Table of Contents





Virtualab

  • Analysts: Five observations on mobile from MWC
  • M'soft says no comment on Project Pink phone
  • What made you become an EE? Join the Conversation
  • Nvidia blames sales shortfall on TSMC
  • MORE
    Prototype fuel cell for handsets eyes fivefold run-time boost
    As part of a research collaboration on miniaturized energy sources, the French Atomic Energy Agency (CEA) and STMicroelectronics NV (Geneva) have prototyped a hydrogen fuel cell for mobile phones that aims to reduce dependency on the use of electrical power supplies to recharge batteries. EE Times' Anne-Francoise Pele Takes a closer look.Click here to learn more.

    Tech Article Library
    Check out CommsDesign's Design corner to find a detail technical articles on a host of communication design issues. To access the design corner, click here.

    Phyworks demos 10G copper interconnects
    Communications chip specialist Phyworks (Bristol, England) has demonstrated 10Gbits/s rack-to-rack copper interconnects of up to 30 metres using technology it originally developed for the optical module market. EE Times Europe's John Walko gets the story. Click here for details.

    Puzzled by a network processing design issue?

    Join former NPF CEO Colin Mick in discussing net processing design issues by clicking here!


    EE Times TechCareers
    Search Jobs

    Enter Keyword(s):


    Function:


    State:
      

    Post Your Resume
    -----------------
    Employers Area
    Most Recent Posts
    Accenture seeking Project Management Team Lead in Charlotte, NC

    Accenture seeking Software Engineer in Salt Lake City, UT

    Boeing Company seeking Software Engineer in Herndon, VA

    Switch and Data seeking Customer Solutions Engineer in Dallas, TX

    Chart Industries seeking Sr. Developer in Cleveland, OH

    More career-related news, resources and job postings for technology professionals




    Home  |  Register  |  About  |  Feedback  |  Contact   |  Site Map
    All materials on this site Copyright © 2010 EE Times Group, a Division of United Business Media LLC All rights reserved.
    Privacy Statement ¦ Terms of Service