


















|
 |
 |
 |

|
|
11 March 2010
|
|
|
Special Section: DSP Design
|
|
VoIP Gateway DSP Algorithms
One way to deal with the tough requirements of packetized voice gateway designs is to use a DSP core-based implementation. Using a DSP core
allows the designer to tightly integrate peripherals and accelerators for processing the plethora of algorithms in a VoIP application.
By Gil Naveh
In recent years, the combined influence of technology and business forces in the communication industry has had an unprecedented impact on network infrastructures, from the core to the customer premise equipment. The availability of voice over IP (VoIP), which transports packetized voice over the Internet,
and the explosion of the Internet has created inexpensive opportunities for long-distance voice calls.
In addition to e-mail and Web browsing, the Internet is being used more and more for voice communications (VoIP), video (streaming), and music (MP3). The popularity of packet-based voice technologies is reflected by the avalanche of VoIP products like VoIP gateways, voice-capable cable modems, IP phones, and PBXs. All these new applications rely on digital signal processor (DSP) technology.
One way to enable this multitude of services is to provide a communication system-on-chip (SOC) that combines a powerful DSP core with tightly integrated peripherals and accelerators for specialized functions.
This article describes the implementation of a single-chip VoIP gateway DSP engine capable of handling a full T1 trunk (24 channels) of compressed voice or fax communications over an IP network.
New problems to solve
To reap the advantages of reduced cost and the bandwidth savings of carrying voice communications over packet networks, designers must deal with problems that did not exist in traditional telephone networks, such as much longer network delays, echoes, and lost voice packets.
Delay in VoIP applications manifests as a combination of network and speech-processing delays. Network delay is a function of the capacity of the
network and the speed of the packet processing. This delay can be as high as 70 ms in some IP networks. Speech-processing delay is a combination of the actual compression and voice-frame collection. These delays cause echoes and prevent fluent conversations (one speakers speech will overlap the other speakers speech when delays are over 200 ms). Echo becomes a significant problem when the round-trip delay becomes greater than 50 ms. VoIP systems address this problem by implementing
echo-cancellation algorithms. IP networks, however, may drop data frames. Data packets containing voice are time-sensitive, and dropped packets cannot be corrected through retransmission (unlike pure data packets). VoIP software solves this problem by replaying the last packet received, during the interval when the lost packet should have been played.
VoIP system architecture
VoIP systems carry voice and signaling information that is required to interface telephony equipment to a packet network. The DSP processes the voice data and passes voice packets to the microprocessor. The voice processing includes echo cancellation, voice compression, voice activity detection, and voice packetization. The microprocessor is responsible for moving the voice packets, and also processes signaling information (such as on-hook and off-hook), converting it
from telephony signaling protocols to the packet signaling protocol.
To maintain cost effectiveness without compromising performance, the VoIP DSP engine illustrated in
Figure 1
includes a DSP core, on-chip SRAM, and an interface to external memory. To avoid unnecessary load on the DSP, a direct memory access (DMA) that automatically transfers information from the pulse code modulation (PCM) interface to buffers in the external memory and later from the
external memory to the on-chip SRAM for digital signal processing should be included. When signal processing is completed, the DMA moves the data back to the external memory and finally forwards the packetized data to the host interface at the microprocessors request. The DMA also performs data transfers in the opposite direction for packets moving from the host interface to the PCM interface.
The VoIP DSP engine requirement set includes various voice-compression algorithms (with varying voice/audio
quality and bandwidth efficiency), at least one standard fax, and one standard data modem. The fax and modem support is a must when trying to serve any call type, including fax and modem.
The classical DSP algorithm requirements for a VoIP gateway DSP engine are:
Vocoders:
G.723.1, G.729a, G.726/727, G.711.
Fax:
V.17 group 3 fax relay.
Modem:
V.32bis (14.400 kbps).
Line echo canceller:
G.165/G.168 compliance.
Tone signaling:
Dual-tone multifrequency (DTMF) relay.
In addition to programmable transmit (Tx) and receive (Rx) gain, silence-compression and packet-loss compensation algorithms are required. For increased system flexibility, independent dynamic speech coder selection per channel is necessary.
Considering these requirements and the increased drive for higher channel density (maximum number of channels/calls per chip), designers must focus on minimizing the voice algorithm processing
load. To do so, the system designer must choose the DSP that best fits the applications (and the customers) requirements.
The DSP system described in
Figure 2
and
Figure 3
acts as a bidirectional gateway between a telephony interface, such as PCM, and a digital network, which is typically managed by a host processor. After the signal from the PCM interface is processed by the echo canceller, a voice/fax
classifier forwards it to the appropriate software module for further compression. Fax channels are demodulated to extract the payload, which is then forwarded to the packet network as a bit stream. Voice channels are compressed by one of the speech-coder modules, and the intervals of silence are subject to a high compression for optimal bandwidth utilization. A DTMF relay preserves any tone signaling superimposed on the voice. Simultaneously, data from the host interface is processed to reconstruct the
original signal. Fax channels are modulated before being relayed to the PCM interface, voice channels are decoded, and the silence intervals are interpolated by the comfort-noise generator. A bad-frame handler compensates for the lost voice packets to minimize the disturbance at the receiving end.
DSP algorithms
Sophisticated vocoders like the G.723.1
and G.729 are complex algorithms to implement. These are irregular DSP algorithms (in contrast to modems, which are based on filtering and correlations) that contain many vocoder-specific algorithms. The specific algorithms include stochastic codebook search, pitch prediction, parameter estimation, and vector quantization. Such specific algorithms are a mix of control code and a lot of mathematical calculations.
A configurable long instruction word (CLIW) architecture can provide the right balance
between scalar/ superscalar DSPs that produces good code size but moderate computational power and the VLIW architectures that provide good computational power, but inefficient code size.
With CLIW, software can be designed so that control code and register initialization can be done with regular instructions (good code size), whereas the inner loops will be written as long instructions like VLIW to minimize the MIPS count. The CLIW, together with the conditional execution property and the memory
destination orientation, yields the most powerful architecture for vocoder implementation, as detailed in
Code Listing 1
.
In this example, the G.729a stochastic codebook search inner loop implementation on the CLIW architecture of the Carmel implementation (3 cycles), compared to the OAK DSP core (14 to 16 cycles), is presented. Note that this loop consumes 10 to 20% of the overall G.729a computational load while it comprises less than 1% of the vocoders code.
Three pieces of code are given: the ITU-T standard reference fixed-point C description, the 14- to 16-cycle OAK implementation, and the 3-cycle Carmel implementation using the high-performance CLIW instructions. (Note the usage of the conditional execution, ifexe, to overcome the branch jumps seen in the OAK implementation.)
The algorithm processing load on the DSP
G.711 packetized PCM 0.2 MIPS
G.723.1 vocoder 7.5 MIPS
G.729A vocoder 5.0 MIPS
G.726/727 vocoder 5.0 MIPS
V.17 G3 fax relay 6.5 MIPS
V.32bis modem relay 9.0 MIPS
Line echo canceller 1.5 MIPS
DTMF relay 0.3 MIPS
Voice/fax/data classifier 0.4 MIPS
Real-time scheduler overhead 0.5 MIPS
Maximum load per
channel 10.4 MIPS
Lowering MIPS requirements
We have described a design based on a DSP core targeting a multichannel fax and voice-over-packet application. It is optimized for maximum channel density and can easily be scaled to a wide variety of gateway solutions that serve hundreds of channels per device by adding more DSP cores.
Gil Naveh
is the vice president of algorithms and software at Infineon Technologies Israel. Prior to joining Infineon, he was with Motorola Communications Israel, working on
speech processing and wireless TDMA modems. He received his BSc and MSc degrees in electrical engineering from Ben-Gurion University, Israel in 1990 and 1992, respectively. He can be contacted at
gil@ic.co.il.
.
|
|
|
Return to the
DSP Special Section
|
|
|
|
|
|
 |
 |
 |
|