Interpolation, Resampling, and Structures for Digital Receivers
With the preponderance of programmable DSP chips, FPGAs, and custom silicon, it is common to use a combination of analog and digital
signal processing or perform timing recovery completely in the digital domain. Timing recovery and implementation of the DSP techniques of interpolation and resampling in todays high rate and variable-rate receivers are discussed.
By Tony Kirke
In the not so distant past, it was common for modems to perform timing recovery with only analog components. However, with the preponderance of programmable DSP chips, field programmable gate arrays (FPGAs), and custom silicon, it is more
common to use either a combination of analog and digital signal processing or perform timing recovery completely in the digital domain. We will briefly review timing recovery in previous generation receivers and discuss in detail how the DSP techniques of interpolation and resampling are being implemented in todays high-rate and variable-rate receivers. We will then consider some of the trade-offs in implementing different receiver structures based on these techniques.
First-generation digital
receivers
For high-rate digital receivers designed in the 1980s and early 1990s, it was often desirable to move all the signal processing to the digital domain for manufacturing and performance reasons. However, it usually was not feasible to do all the processing digitally, due to high sampling rates or silicon limitations. These modems generally used a simple combination of analog and digital timing recovery. The timing (and carrier) discriminator was typically digital with the A/D sampling being
controlled with a voltage-controlled oscillator (VCO). An example of this method with a digital phase-locked loop (PLL) driving the sampling clock of the A/D is shown in
Figure 1
. Early generation digital-video broadcast (DVB) quadrature phase-shift keying (QPSK) chips, such as Philipss TDA8041, SGS-Thompsons STV0196, and LSIs first-generation DVB QPSK used this technique. The VCO adjusts the sampling phase of the A/D to ensure that the digital samples
are at the optimum timing instants. Generally the A/D samples at 2, 3, or 4 samples/symbol, with 4 samples/symbol usually being the best choice since it eases the transition band requirements of both the analog anti-aliasing filter and digital filtering. This method is awkward since the digital portion needs to keep in synchronization with the A/D converters and places strict requirements on the VCO and PLL.
A variation of this method was used in Broadcoms BCM3100 QAMLINK digital quadrature
amplitude modulation (QAM) receiver. In this case, the timing-recovery loop control is purely digital, with the equivalent of digital interpolation performed by adjusting the phase of the A/D sampling. This is accomplished by using a numerically controlled oscillator (NCO) clocked at least 4 times the sampling rate. While the most significant bit (MSB) output of the NCO is generally changing at the sampling rate (20 MHz for this case), the phase of the MSB and, hence, the A/D sampling clock can be adjusted in
10-ns steps (100-MHz clock intervals). Therefore, for a 5-MHz symbol rate, we get 4 samples/symbol (20-MHz sample rate) and can adjust the timing resolution in steps of 5% (1/20th) of a symbol. The residual timing error (assuming that the sampling resolution is the limiting factor) would have a variance of 0.4% (1/20*1/12) of a symbol. In this case, there is a restriction that the system (and A/D) clock must be very close to an integral multiple of the baud rate (in order to achieve the timing error variance
given). However, since these modems were often designed for a very specific purpose (i.e. one baud rate), an appropriate crystal source could be specified, which was not necessarily seen as a restriction.
Note that in this example, there are two A/Ds with downconversion occurring prior to the A/Ds. This is another restriction of the technique, since changing the instantaneous phase of the A/Ds works well as long as the residual frequency offset is small. This is due to the fact that changing the A/D
sampling phase changes the phase of the residual carrier, impacting the ability of the receiver to maintain carrier lock. This restriction also generally requires the carrier loop to be a combination of analog and digital processing. While this is quite acceptable for bipolar phase-shift keying (BPSK) and QPSK it can be very restrictive for 16-QAM or greater.
At lower bit rates where it is feasible for the digital portion to operate at greater than 8 samples/symbol, another sampling technique was used. Since
for QPSK and BPSK modems operating in the satellite channel, the degradation due to timing jitter is not very great (assuming forward error correction), interpolating between samples to reduce the timing error to an acceptable level is not always necessary. Assuming that the A/Ds are clocked at a rate of exactly 8 samples/symbol, the bit-error rate (BER) degradation due to finite sampling resolution (for ideal, unfiltered QPSK) is only about 0.25 dB at an
E
s
/N
o
of 7 dB. The input samples are decimated to the symbol rate under control of the timing recovery loop, which effectively selects the correct samples for symbol lock. For satellite receivers that use digital integrate and dump (I&D) filtering for the matched filter, usually in combination with an analog surface acoustic-wave (SAW) filter, a wide range of different baud rates that are submultiples of the sampling rate can be supported by programming the I&D filter. Also, since the A/D sampling
clock is fixed, this structure is suitable for intermediate frequency (IF) or bandpass sampling. Thus, one A/D can be used instead of two, reducing cost and improving performance through the elimination of A/D mismatches.
Since the BER degradation due to timing jitter for greater than 8 samples/symbol is not very significant, this structure could be used to handle
any
baud rate (with greater than 8 samples/symbol) by driving an I&D filter with a digital PLL (
Figure
2
). The inverse of the ratio of the sampling rate to the symbol rate is programmed into the NCO. Whenever the MSB overflows, the I&D filter is dumped, and the result is passed to the next processing block. Thus, the I&D filter could integrate different numbers of samples for each symbol, depending on the contents of the NCO. For example, if the sampling rate was 8.5 times the symbol rate, the I&D filter would sum 8, 9, 8, 9, etc. samples (assuming that further processing was at the symbol
rate). This is roughly equivalent to the timing jitter that would occur for the case of exactly 8 samples/symbol. For lower baud rates (and hence more samples/symbol), the amount of timing jitter is even less significant.
Interpolation for high baud-rate receivers
For the modems discussed, digital interpolation was not usually performed because of the high silicon cost, process limitations, or perhaps even time-to-market considerations (leveraging previous designs). As these barriers began to
fall and the customer demands increased, not only did digital interpolation become feasible, but research in the area became more evident and more practical. In fact, most of the classical DSP and communication theory books do not have significant sections on interpolation in modems. The following section will discuss some of the interpolation techniques being used in todays digital receivers.
Most of the discussion on interpolation in DSP books is related to increasing the sampling rate by either
pure integer rates or by rational fractions (such as 7/5, 5/3, etc.). These interpolators were typically fixed-function finite impulse response (FIR) or infinite impulse response (IIR) filters designed for a single change of rate. For example, for digital receivers sampling at 4 samples/ symbol, these structures would increase the sampling rate so that the timing resolution was sufficiently small and then decrease the sampling rate back down to a level suitable for filtering with a matched filter. A typical
example would be for the interpolation block to increase the sampling rate from 4 samples/symbol to 16 samples/symbol and then decrease back down to 4 samples/symbol for further processing. This interpolation was accomplished by means of a two-stage process: the insertion of
N-1
zeros between every two samples followed by the low-pass filtering of the zero-inserted signal by a filter with a cutoff frequency of
fs/2N
. The theoretical time-domain impulse response for the ideal
low-pass filter is the sine (
sin(x)/x
) function.
Polyphase filtering and low-pass filtering requirements
For these cases of interpolation, the filtering and processing requirement can be reduced significantly by using a polyphase filtering approach (
Figure 3
). Since in the previous case, the interpolation (or up-sampling) is immediately followed by decimation, it is possible to reduce the computational complexity by only performing the interpolation for
the samples that will be used after decimation. Thus, the interpolation FIR can run at the original sampling rate rather than at the up-sampled rate.
The low-pass FIR filter can be designed with any of several methods that attempt to approximate the ideal low-pass filter response. However, the normal metrics used for evaluating low-pass filters, such as passband ripple, stopband response, and transition bandwidth, are not necessarily the best for the interpolation filter. The ultimate measure of the
quality of any particular design is in its impact on the BER in the receiver. This can be evaluated by a complete system simulation. Generally, the interpolator will result in additional intersymbol interference (ISI), additional noise, or an amplitude variation that is dependent on the interpolation phase. In cases where a complete simulation is not feasible or some preliminary decision on the interpolation strategy is necessary, other methods must be used to determine the performance of a particular
implementation. The next section will briefly discuss some of the different methods used for the design of these interpolation filters.
Fractional delay filtering and interpolation methods
Another way of looking at interpolation, within the context of what we are often trying to achieve, is by reformulating the problem as a fractional delay (i.e. delaying the signal by a fraction of the sampling rate). From a purely conceptual point of view, the work on fractional delays has centered on the idea of
real-time calculation of coefficients and, hence, has not been as restrictive as the interpolation methods described in classic DSP books. The ideal impulse response for the fractional delay is also given by the infinite-length sine function. The least squared integral error design is the truncated-sine impulse response. However, researchers have also used other measures for the optimal designs of these filters.
The truncated sine filter does not yield an optimal solution due to the Gibbs phenomenon, which
results in non-ideal amplitude variation versus frequency (
Figure 4
). Two useful window functions that can reduce this effect are the Kaiser window and the Dolph-Chebyshev window. While real-time calculation of the coefficients is possible, it often is more efficient to calculate all the possible coefficients and store them in a ROM look-up table. This method is often the best when a certain sampling resolution is required without needing to change the sampling rate, in
which case the polyphase approach can be used.
Linear interpolation can be performed using the following formula:
y
k
= (1-µ)*x
k
+ µ*x
k+1
where
µ
= the desired fractional delay
The frequency response of this interpolation is a squared sine function. Generally, linear interpolation works well in cases where the signal of interest is a baseband signal with a narrow bandwidth relative to the
sampling rate. In the frequency domain, the area near the nulls of the
sine
2
function are aliased back into the passband of the signal, with the amount of aliasing depending on the ratio of the output and input sampling rates. With a narrower signal spectrum (relative to the original sampling rate), there is less aliasing and, hence, distortion of the signal. It has been reported that for BPSK, a linear interpolator can cause about 0.05-dB signal-to-noise ratio (SNR) degradation at
a BER of 10
-2
using 4 samples/symbol (50% excess bandwidth case) versus 0.2 dB using 2 samples/symbol.
When linear interpolation does not provide sufficient performance, higher-order polynomial interpolation formulas can be used. Any classical polynomial interpolation formula can be described in terms of its Lagrange coefficients. The Lagrange interpolation formula results in coefficients that provide a frequency response that is maximally flat at a certain frequency (typically zero). The
Lagrange coefficients are conveniently given by the formula below:
k! = n
, where
µ
is the fractional delay.
An example of a second-order polynomial is given below:
Y
k
= C
-1
*x
k-1
+ C
0
*x
k
+ C
1
*x
k+1
where
C
-1
= 1/2 *µ
*(µ+1),
C
0
= -(µ-1)* (µ+1)
, and
C
1
= 1/2 *µ*( µ-1)
This filter can be implemented quite easily where the coefficients are calculated in real-time rather than using a ROM look-up table.
A very efficient implementation of these polynomial filters is known as the Farrow structure. We can write the equation just given as the Z-domain transfer function of the interpolator. By rearranging the coefficients, we can get
a polynomial in
µ
:
H(z) = C
0
(z) + C
1
(z)*µ +
C
2
(z)*
µ
2
where
C(z) = 1
,
C
1
(z) = -3/2 + 2*z
-1
- 1/2*z
-2
,
and
C
2
(z) = 1/2 - z
-1
+ 1/2*z
-2
Since the coefficients are fixed and only
µ
needs to be updated, this structure is much more efficient for rapidly changing the desired fractional delay. The Farrow structure for this second-order Lagrange interpolator is shown in
Figure 5
. It can be seen that both the unit delays and the multipliers can be shared between coefficients.
Receiver structures for fixed-rate interpolation
There are several possible receiver structures for both fixed rate and variable-rate applications (
Figure 6
). An important factor in determining which structure to use, however, depends on whether the receiver needs to support several different matched filters (i.e. several different roll-off or excess bandwidth factors). This requirement determines whether the matched filter coefficients need to be programmable or if the design can be optimized based on fixed coefficients. We will now look at some of the commonly used structures.
This structure is suitable for cases where
the sampling rate is close to an integer multiple of the symbol rate (
Figure 6a
), in which case the interpolator is equivalent to a fractional delay filter. The output of the matched filter will usually be at either 1 or 2 samples/symbol (depending on the timing and carrier lock algorithms used). The entire symbol timing-recovery loop can be placed after the matched filter. This structure can be advantageous for receivers needing rapid acquisition (such as burst modems).
Since the timing recovery is after the matched filter, the delay within the timing loop can be minimized when timing error feedback is used. For rapid frequency acquisition, this structure may also work well since the interpolation block can be outside of the frequency recovery loop, thus allowing higher loop bandwidths.
Combined interpolation and matched filtering
Since both the interpolation and matched filter blocks are usually FIR filters, they can be combined into a single FIR (see
Figure 6b
). The combined filter would then simply decimate the incoming samples to either 1 or 2 samples/symbol using a polyphase filtering approach. In this case, one can simply design the matched filter for the timing resolution required. So, for example, if the maximum timing error was required to be less than 1/16th of a symbol, the matched filter could be designed for a 16 samples/symbol sampling rate. Assuming that the sampling rate is a multiple number of samples/ symbol,
only the phase of the polyphase filter needs to be changed for timing recovery.
Combined interpolation and matched filter approach
When the receiver needs to demodulate a continuous range of symbol rates, the combined filter method needs to be slightly modified, since we are no longer guaranteed that the sampling rate will be an integer multiple of the symbol rate. For example, suppose that the above filter was designed for 4 samples/symbol with a resolution of 16 samples/symbol. If our real
sampling rate is 3.1 samples/ symbol, then we could simply pick the matched filter coefficients that would be nearest in time to our desired offset. Even in the case of no interpolation (i.e. the zero-phase filter), this would result in coefficients that differed from the ideal, thus degrading performance. This degradation can be significantly reduced by designing the filter so that the timing resolution is a very small fraction of a symbol. This both improves the timing jitter at higher symbol rates and
allows variable-rate decimation. The drawback is that the coefficient look-up table grows significantly, and the indexing of the coefficients becomes more complex (perhaps requiring multipliers in some cases). This approach is used in Odeum Microsystems QPSK DVB receiver.
Although such an approach has its merits, it may be advantageous to consider separate blocks for interpolation and matched filtering. For example, if the matched filter is a root raised cosine with a fixed roll-off, it is possible that
a very efficient FIR can be designed with only canonic signed decimal (CSD) coefficients (so that no multipliers are required). Given that the matched filter is fixed, the interpolation block can then be optimized so that the receiver meets its performance requirements.
Figure 6c
shows the structure for a variable-rate receiver. This structure is also suitable when the matched filter needs to be programmable, since the amount of RAM for the combined structure would become
excessively large. This structure was a natural follow-up to the first-generation QPSK DVB receivers from LSI and SGS-Thompson that had matched filter blocks operating at 2, 3, and 4 samples/symbol. By placing digital interpolation prior to the matched filter, it was possible to do all the timing recovery digitally without having to re-architect the chips. In order to cover a wide range of symbol rates, such as the 1-Msps to 30-Msps range supported by most current DVB chips, the interpolation block can be
preceded by a variable multistage decimation block. (Note: The decimation could also follow the interpolation block.) Since decimation works best for fixed decimation rates, this is typically done by having several stages of decimation filters with the ability to bypass each stage. Perhaps, the most efficient structure is for each stage to be a half-band filter. The multistage decimation, combined with the incommensurate rate resampling, allows any symbol rate (over the whole range) to be demodulated.
Receivers with adaptive equalization
Although there are several different options available for receiver structures when we add an adaptive equalizer, we will just briefly mention the most common considerations as related to interpolation. For baud-spaced adaptive equalizers, interpolation will often take place prior to the matched filter and almost always prior to the equalizer, since timing recovery is required for proper operation of the equalizer. However, for fractionally-spaced equalizers,
the equalizer itself can do the appropriate interpolation while performing the equalization provided that the sampling rate is very close to an integer multiple of the equalizer sampling rate (i.e. the timing error changes very slowly). For high-speed Ethernet (greater than 100 Mbps), some designers have proposed adaptive equalizers that perform matched filtering, equalization, and timing interpolation within a single structure.
Tony Kirke is currently a senior DSP engineer at Proxim in
Mountain View, CA, where he is involved in the design of broadband wireless modems. He has over 13 years of design experience in wireless, satellite, and cable communication systems and VLSI. He has a BSEE degree from Trinity College in Ireland. He can be reached at tony_kirke@yahoo.com.