


















|
 |
 |
 |

|
|
18 March 2010
|
|
|
Feature
|
|
Advances in Speech Enhancement on Voice-over-IP Applications
A number of speech quality impairments within VoIP networks threaten the ability to provide comparable quality to a circuit-switched
network (CSN). The challenge is to recognize the scale of the problem from the outset and to invest in a suitable solution within the core infrastructure.
PLEASE NOTE: This article is complemented by audio examples. In order to hear them, you will need to download the
Real Audio Player
.
By Jerry Skene
As they prove to be a viable alternative to the traditional CSN, the
popularity of packet-switched networks for voice applications is rapidly growing. The downside of this equation concerns voice transport, which continues to be the number one revenue stream for most operators worldwide.
1
These voice-over-IP (VoIP) packet networks introduce a number of potential speech quality problems that can degrade performance significantly compared to CSNs. This article outlines the nature of these problems, and how they can be addressed through new speech enhancement
technology developments.
Quality is Number 1
Voice quality is important. This is recognized by operators and customers alike. Any factors which affect speech quality in packet voice networks need to be understood and tackled effectively. These factors include:
Echo from the switched telephone network
Annoying background noise artifacts
Improper speech levels in the network
Tandem speech coding.
There is, of course, a direct correlation between speech quality and call holding times. This means that better speech quality can directly result in greater revenue generation, due to longer call duration.
Call quality determines customer satisfaction, as shown by surveys conducted by
acknowledged industry experts. One such report, from J.D. Powers and Associates, polled 10,000 wireless subscribers and recorded that call quality is the most important of eight overall satisfaction factors (Wireless Satisfaction Report, 1997, updated 1999). In a more recent report, conducted in the UK, the results indicated that call quality/coverage accounted for 34% of the customer satisfaction index weights.
2
From this, it is clear why voice quality is increasingly regarded as an
important weapon in the battle to reduce churn, particularly in the wireless sector.
Factors affecting voice quality
Voice quality can vary tremendously in VoIP networks; the gateway equipment, the phone systems being utilized, the client software, and the carrier infrastructure all have an effect on quality. The greatest culprits are
echo-related, causing VoIP networks to suffer from a complex combination of problems. If the impairments are individually examined, it is easier to grasp the extent of the problem.
First, there are the packet-specific impairments, which include:
The inherent issue of latency, involving an accumulation of transmission delay, the packetization itself, the coder, and the jitter buffer
Delay variation (jitter)
Packet loss
Coding distortion (direct and/or tandem).
All of these impairments affect voice quality either directly (by interfering with the coding of the speech) or indirectly (by adding delay and noise). For example, speech compression and packet routing alone introduce delays ranging from 20 to 300 ms one way.
Figure 1
(from ITU-T Recommendation G.131) demonstrates the delay issue on a logarithmic
scale. As delay increases, the requirement for echo control gets higher. Comparing this to the delays in a packet network, the typical delays can accumulate in a packet transport system independent of and in addition to transmission delays. The total round-trip additional delay can easily be 190 ms in excess of the delay experienced with time division multiplexing (TDM) transmission. The net result is that the VoIP application requires a much greater degree of echo control sophistication if toll-grade voice
quality is to be maintained. This puts the focus on the role of echo cancellation and the most effective placement of this function in the network.
The second group of impairments affects all networks, not merely those with packet backbones, and includes:
- Echo.
Two distinct types of echo are present in modern communications: hybrid echo (caused at the 4-wire to 2-wire interface or hybrid), and acoustic, or multipath, echo.
- Noise levels.
Background noise in a call can have a powerful degrading effect on call quality.
- Signal levels and the impact of
variations as the levels fluctuate from too hot to too low.
Against a growing background of deregulation, this continues to be a major quality detractor.
The traditional problem: echo
Hybrid echo is located in
the CSN at the point where the 4-wire network is converted to the 2-wire local loop. Speech is transmitted over the VoIP network and passes through the CSN hybrid, resulting in some speech being reflected back by the hybrid to the VoIP network. This echo passes once again through the VoIP network and is delayed again for up to 600 ms. At this point, it becomes extremely noticeable to VoIP users.
There are solutions, directed towards the PSTN, to hybrid echo that involve the deployment of digital
echo cancellation. By placing an echo canceller at both ends of a VoIP connection, the problem of hybrid echo can be eliminated using the cancellers sophisticated facilities to memorize and remove echo. Incoming speech from the VoIP network is sent to the CSN, and is also stored in the cancellers memory. The echo of this signal, combined with local speech from the CSN, is received by a digital filter, which compares the signal from the CSN to the reference or memorized signal. This is done before
subtracting the majority of the similarities, leaving a small amount of residual echo. A nonlinear processor (a smart attenuator) then removes any traces of the residual echo, producing an echo-free result.
Tackling Background Noise
Background noise is another potential problem. This noise is picked up from the telephone handset and becomes
distorted after compression by the speech coder in the VoIP gateway. The result is highly annoying background noise, which regularly causes VoIP users to complain to their operators. Once again, the echo-cancellation platform can provide an answer to this problem, using automatic noise reduction (ANR) to deliver higher speech quality. Recent advances have resulted in a breakthrough capable of reducing background noise by up to 75%. State-of-the-art ANR works on stationary noise (noise that does not vary
significantly over a period of time). The technology works by learning the spectral frequencies of the background noise and filtering accordingly. This enables a considerable improvement in speech intelligibility, without losing the ambience of the environment from which the call is being made.
Varying Signal Levels
The deregulation
of telecom markets has produced a situation where many new carriers are coming into existence, accompanied by a wide range of telephone sets and brands. Call routing is also becoming increasingly complex, involving interconnections across a cross-section of network types and technologies. The net effect of this scenario is that incoming levels are often too high or too low as a result. This has a subsequent impact on network quality an issue that is concerning many international telcos and operators.
There are five different equipment areas affected by varying levels: analog-to-digital (A/D) converters, low bit-rate speech coders, voice activity detectors, echo cancellers, and fax modems. The performance of this equipment will vary depending upon the signal levels, which directly impacts voice quality and facsimile transmission. Each active device, such as an amplifier or speech codec, will have a certain dynamic range over which it will function to specification. Outside this range the performance may
degrade rapidly, leading to noise and distortion of various types. For this reason, it is important to maintain speech levels within the dynamic range specified for the equipment.
In VoIP applications, the speech coder in the VoIP network distorts speech signals that are too high or too low. To solve this problem, automatic level control (ALC) provides an effective, unobtrusive means of improving the perceived speech quality of a call by automatically optimizing active speech levels. The software reacts
intelligently to varying speech levels, adjusting the appropriate parameters in real time to an optimal operating level. With ALC, the software only operates on active speech, ensuring that voiceband data transmission remains totally unaffected.
On the standards front, ITU Recommendation G.169 provides the basis for ensuring that this problem is controlled more effectively in the future, setting out the requirements for the control of signal-level variation through a range of equipment. The automatic
level controller is an important tool in this respect, monitoring levels as they come into the network and enabling a range of level control options to compensate for variations.
Tandem Speech Coding
The increased market penetration of digital cellular and IP is resulting in a higher probability of mixed-network calls. In such calls, multiple
speech coders convert speech from one coding format to another, and then back again. Speech quality is degraded by this multiple speech coding conversion. In order to address this situation, ETSI is currently developing an end-to-end protocol, called Tandem Free Operation (TFO), to bypass intermediate GSM speech coders, thereby improving speech quality in mixed-format networks. The TFO protocol allows the removal of intermediate coding and decoding (transcoding) stages to leave only the codec processes on the
terminal equipment. The TFO scheme allows the bit-integral GSM 16 kbps (actually 13-kbps speech frames with additional data padding) to pass transparently between mobile terminals. This means that in the standard 8-bit PCM signal frame (which operates at 64 kbps), the TFO signal would normally pass between just two of the PCM bits. Clearly, this protocol will have an impact on all in-path equipment, such as echo cancellers, and is well-suited to VoIP mixed networks. Taking all these issues into account, VoIP
networks present a challenging environment for voice quality.
Developing Internation VoIP Standards
Standards will be critical to the success of this new packet-based service. VoIP gateways must be able to talk to existing PSTN networks and to each other. Signaling interfaces must function so that a VoIP call is as easy as a PSTN call. Many
standards groups are actively involved in ensuring that this happens. Some of these include the ITU, ETSI, IETF, and the International Multimedia Teleconferencing Consortium (IMTC). The quest to develop international standards becomes a paramount consideration, given the range of issues previously outlined. The ITU is developing a new question on an international standard for VoIP gateways: Question 21 in Study Group 15 of the ITU-T will specify certain functions and characteristics of these gateways in a
new Recommendation called G.799.1. This will help ensure a consistent level of speech performance across such gateways, preserving the high quality of international voice services. It will also make it easier for VoIP carriers to determine whether new gateways fully meet the new requirement specifications. Key areas where performance requirements are being defined include switched circuit-bearer interfaces, IP-bearer interfaces, signaling protocols, echo cancellation, end-to-end delay, the handling of
voiceband data such as fax and data modems, the effects of cell loss, methods of avoiding tandem speech coding, and control and configuration interfaces.
VoIP gateways may be composed of multiple pieces of equipment, each with specialized functions, such as signaling interfaces, speech compression/decompression, and packetization.
Figure 2
illustrates some of the functions performed in such a gateway, while
Figure 3
shows the
overall location of the gateway in the network. Recommendation G.799.1 is being developed, but it does not specify how these functions are to be performed, or the specific interconnections that may be implemented between functions. However, it will define the functions themselves and the interface to other components in the overall network.
Figure 3
illustrates where the VoIP gateway fits into the overall network model. Recommendation G.799.1 is expected to be
completed by April 2000.
Investing in Quality
The task for the new IP network experts is clear-cut: provide equivalent voice quality to existing networks and ensure that there is efficient interworking with already installed equipment. Achieving this objective is difficult, however. As illustrated in this article, there are a number of speech
quality impairments that occur within VoIP networks and threaten the ability to provide comparable quality to a CSN. There are also lessons to be drawn from the CSN example, with much of the technology that has been successfully deployed in the CSN to solve similar problems now migrating to the VoIP network. The challenge is to recognize the scale of the problem from the outset and invest in a suitable solution within the core infrastructure. Failure to do so could impact heavily on the goal of delivering
toll-quality voice services. On the international scene, there are new standards being developed to help ensure that the highest-quality voice is preserved over packet-based networks initiatives that will go a long way towards guaranteeing that VoIP networks make the grade.
Jerry Skene
is the standards director at Tellabs, Inc. He leads Tellabs participation in international standards organizations, including the
ITU, ETSI, and IETF. Skene holds four patents
and an MS in applied physics from
McMaster University in Hamilton, Ontario. He can be reached at
jerry.skene@tellabs.com
|
|
|
Return to the
Table of Contents
|
|
|
|
|
|
 |
 |
 |
|