Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec




















eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Green SupplyLine
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • The RF Edge
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


06 July 2009

Feature

Advances in Speech Enhancement on Voice-over-IP Applications


A number of speech quality impairments within VoIP networks threaten the ability to provide comparable quality to a circuit-switched network (CSN). The challenge is to recognize the scale of the problem from the outset and to invest in a suitable solution within the core infrastructure.

PLEASE NOTE: This article is complemented by audio examples. In order to hear them, you will need to download the Real Audio Player .

By Jerry Skene

As they prove to be a viable alternative to the traditional CSN, the popularity of packet-switched networks for voice applications is rapidly growing. The downside of this equation concerns voice transport, which continues to be the number one revenue stream for most operators worldwide. 1 These voice-over-IP (VoIP) packet networks introduce a number of potential speech quality problems that can degrade performance significantly compared to CSNs. This article outlines the nature of these problems, and how they can be addressed through new speech enhancement technology developments.


Quality is Number 1

Voice quality is important. This is recognized by operators and customers alike. Any factors which affect speech quality in packet voice networks need to be understood and tackled effectively. These factors include:

  • Echo from the switched telephone network

  • Annoying background noise artifacts

  • Improper speech levels in the network

  • Tandem speech coding.

There is, of course, a direct correlation between speech quality and call holding times. This means that better speech quality can directly result in greater revenue generation, due to longer call duration.

Call quality determines customer satisfaction, as shown by surveys conducted by acknowledged industry experts. One such report, from J.D. Powers and Associates, polled 10,000 wireless subscribers and recorded that call quality is the most important of eight overall satisfaction factors (“Wireless Satisfaction Report,” 1997, updated 1999). In a more recent report, conducted in the UK, the results indicated that call quality/coverage accounted for 34% of the customer satisfaction index weights. 2 From this, it is clear why voice quality is increasingly regarded as an important weapon in the battle to reduce churn, particularly in the wireless sector.


Factors affecting voice quality

Voice quality can vary tremendously in VoIP networks; the gateway equipment, the phone systems being utilized, the client software, and the carrier infrastructure all have an effect on quality. The greatest culprits are echo-related, causing VoIP networks to suffer from a complex combination of problems. If the impairments are individually examined, it is easier to grasp the extent of the problem.

First, there are the packet-specific impairments, which include:

  • The inherent issue of latency, involving an accumulation of transmission delay, the packetization itself, the coder, and the jitter buffer

  • Delay variation (jitter)

  • Packet loss

  • Coding distortion (direct and/or tandem).

All of these impairments affect voice quality either directly (by interfering with the coding of the speech) or indirectly (by adding delay and noise). For example, speech compression and packet routing alone introduce delays ranging from 20 to 300 ms one way.

Figure 1 (from ITU-T Recommendation G.131) demonstrates the delay issue on a logarithmic scale. As delay increases, the requirement for echo control gets higher. Comparing this to the delays in a packet network, the typical delays can accumulate in a packet transport system independent of and in addition to transmission delays. The total round-trip additional delay can easily be 190 ms in excess of the delay experienced with time division multiplexing (TDM) transmission. The net result is that the VoIP application requires a much greater degree of echo control sophistication if toll-grade voice quality is to be maintained. This puts the focus on the role of echo cancellation and the most effective placement of this function in the network.

The second group of impairments affects all networks, not merely those with packet backbones, and includes:

  • Echo. Two distinct types of echo are present in modern communications: hybrid echo (caused at the 4-wire to 2-wire interface or hybrid), and acoustic, or multipath, echo.

  • Noise levels. Background noise in a call can have a powerful degrading effect on call quality.

  • Signal levels and the impact of variations as the levels fluctuate from too hot to too low. Against a growing background of deregulation, this continues to be a major quality detractor.


The traditional problem: echo

Hybrid echo is located in the CSN at the point where the 4-wire network is converted to the 2-wire local loop. Speech is transmitted over the VoIP network and passes through the CSN hybrid, resulting in some speech being reflected back by the hybrid to the VoIP network. This echo passes once again through the VoIP network and is delayed again for up to 600 ms. At this point, it becomes extremely noticeable to VoIP users.

There are solutions, directed towards the PSTN, to hybrid echo that involve the deployment of digital echo cancellation. By placing an echo canceller at both ends of a VoIP connection, the problem of hybrid echo can be eliminated using the cancellers’ sophisticated facilities to memorize and remove echo. Incoming speech from the VoIP network is sent to the CSN, and is also stored in the canceller’s memory. The echo of this signal, combined with local speech from the CSN, is received by a digital filter, which compares the signal from the CSN to the reference or memorized signal. This is done before subtracting the majority of the similarities, leaving a small amount of residual echo. A nonlinear processor (a smart attenuator) then removes any traces of the residual echo, producing an echo-free result.


Tackling Background Noise

Background noise is another potential problem. This noise is picked up from the telephone handset and becomes distorted after compression by the speech coder in the VoIP gateway. The result is highly annoying background noise, which regularly causes VoIP users to complain to their operators. Once again, the echo-cancellation platform can provide an answer to this problem, using automatic noise reduction (ANR) to deliver higher speech quality. Recent advances have resulted in a breakthrough capable of reducing background noise by up to 75%. State-of-the-art ANR works on stationary noise (noise that does not vary significantly over a period of time). The technology works by “learning” the spectral frequencies of the background noise and filtering accordingly. This enables a considerable improvement in speech intelligibility, without losing the ambience of the environment from which the call is being made.


Varying Signal Levels

The deregulation of telecom markets has produced a situation where many new carriers are coming into existence, accompanied by a wide range of telephone sets and brands. Call routing is also becoming increasingly complex, involving interconnections across a cross-section of network types and technologies. The net effect of this scenario is that incoming levels are often too high or too low as a result. This has a subsequent impact on network quality — an issue that is concerning many international telcos and operators. There are five different equipment areas affected by varying levels: analog-to-digital (A/D) converters, low bit-rate speech coders, voice activity detectors, echo cancellers, and fax modems. The performance of this equipment will vary depending upon the signal levels, which directly impacts voice quality and facsimile transmission. Each active device, such as an amplifier or speech codec, will have a certain dynamic range over which it will function to specification. Outside this range the performance may degrade rapidly, leading to noise and distortion of various types. For this reason, it is important to maintain speech levels within the dynamic range specified for the equipment.

In VoIP applications, the speech coder in the VoIP network distorts speech signals that are too high or too low. To solve this problem, automatic level control (ALC) provides an effective, unobtrusive means of improving the perceived speech quality of a call by automatically optimizing active speech levels. The software reacts intelligently to varying speech levels, adjusting the appropriate parameters in real time to an optimal operating level. With ALC, the software only operates on active speech, ensuring that voiceband data transmission remains totally unaffected.

On the standards front, ITU Recommendation G.169 provides the basis for ensuring that this problem is controlled more effectively in the future, setting out the requirements for the control of signal-level variation through a range of equipment. The automatic level controller is an important tool in this respect, monitoring levels as they come into the network and enabling a range of level control options to compensate for variations.


Tandem Speech Coding

The increased market penetration of digital cellular and IP is resulting in a higher probability of mixed-network calls. In such calls, multiple speech coders convert speech from one coding format to another, and then back again. Speech quality is degraded by this multiple speech coding conversion. In order to address this situation, ETSI is currently developing an end-to-end protocol, called Tandem Free Operation (TFO), to bypass intermediate GSM speech coders, thereby improving speech quality in mixed-format networks. The TFO protocol allows the removal of intermediate coding and decoding (transcoding) stages to leave only the codec processes on the terminal equipment. The TFO scheme allows the bit-integral GSM 16 kbps (actually 13-kbps speech frames with additional data padding) to pass transparently between mobile terminals. This means that in the standard 8-bit PCM signal frame (which operates at 64 kbps), the TFO signal would normally pass between just two of the PCM bits. Clearly, this protocol will have an impact on all in-path equipment, such as echo cancellers, and is well-suited to VoIP mixed networks. Taking all these issues into account, VoIP networks present a challenging environment for voice quality.


Developing Internation VoIP Standards

Standards will be critical to the success of this new packet-based service. VoIP gateways must be able to talk to existing PSTN networks and to each other. Signaling interfaces must function so that a VoIP call is as easy as a PSTN call. Many standards groups are actively involved in ensuring that this happens. Some of these include the ITU, ETSI, IETF, and the International Multimedia Teleconferencing Consortium (IMTC). The quest to develop international standards becomes a paramount consideration, given the range of issues previously outlined. The ITU is developing a new question on an international standard for VoIP gateways: Question 21 in Study Group 15 of the ITU-T will specify certain functions and characteristics of these gateways in a new Recommendation called G.799.1. This will help ensure a consistent level of speech performance across such gateways, preserving the high quality of international voice services. It will also make it easier for VoIP carriers to determine whether new gateways fully meet the new requirement specifications. Key areas where performance requirements are being defined include switched circuit-bearer interfaces, IP-bearer interfaces, signaling protocols, echo cancellation, end-to-end delay, the handling of voiceband data such as fax and data modems, the effects of cell loss, methods of avoiding tandem speech coding, and control and configuration interfaces.

VoIP gateways may be composed of multiple pieces of equipment, each with specialized functions, such as signaling interfaces, speech compression/decompression, and packetization. Figure 2 illustrates some of the functions performed in such a gateway, while Figure 3 shows the overall location of the gateway in the network. Recommendation G.799.1 is being developed, but it does not specify how these functions are to be performed, or the specific interconnections that may be implemented between functions. However, it will define the functions themselves and the interface to other components in the overall network. Figure 3 illustrates where the VoIP gateway fits into the overall network model. Recommendation G.799.1 is expected to be completed by April 2000.


Investing in Quality

The task for the new IP network experts is clear-cut: provide equivalent voice quality to existing networks and ensure that there is efficient interworking with already installed equipment. Achieving this objective is difficult, however. As illustrated in this article, there are a number of speech quality impairments that occur within VoIP networks and threaten the ability to provide comparable quality to a CSN. There are also lessons to be drawn from the CSN example, with much of the technology that has been successfully deployed in the CSN to solve similar problems now migrating to the VoIP network. The challenge is to recognize the scale of the problem from the outset and invest in a suitable solution within the core infrastructure. Failure to do so could impact heavily on the goal of delivering toll-quality voice services. On the international scene, there are new standards being developed to help ensure that the highest-quality voice is preserved over packet-based networks — initiatives that will go a long way towards guaranteeing that VoIP networks make the grade.

Jerry Skene is the standards director at Tellabs, Inc. He leads Tellabs’ participation in international standards organizations, including the ITU, ETSI, and IETF. Skene holds four patents and an MS in applied physics from McMaster University in Hamilton, Ontario. He can be reached at jerry.skene@tellabs.com


Audio Clips
You will need Real Audio Player to listen to these audio files
  • 180ms Echo
  • 30ms Echo
  • Noise -After
  • Noise-Before
  • High Speech Level

  • Illustrations
    Figure 1
    Figure 2
    Figure 3
    References
    1. Various sources including Level 3 Communications — ITU World Telecom, Geneva, 1999.
    2. J.D. Power and Associates “UK Mobile Customer Satisfaction Study,” 1999.



    Return to the Table of Contents





    Virtualab

  • Intel reportedly signs wireless IC supply deal with Nokia
  • WiMax group calls for patent pool
  • Samsung, Toshiba renew NAND patent pact
  • Nokia Siemens agree to pay $650M for Nortel assets
  • MORE
    Prototype fuel cell for handsets eyes fivefold run-time boost
    As part of a research collaboration on miniaturized energy sources, the French Atomic Energy Agency (CEA) and STMicroelectronics NV (Geneva) have prototyped a hydrogen fuel cell for mobile phones that aims to reduce dependency on the use of electrical power supplies to recharge batteries. EE Times' Anne-Francoise Pele Takes a closer look.Click here to learn more.

    Tech Article Library
    Check out CommsDesign's Design corner to find a detail technical articles on a host of communication design issues. To access the design corner, click here.

    Phyworks demos 10G copper interconnects
    Communications chip specialist Phyworks (Bristol, England) has demonstrated 10Gbits/s rack-to-rack copper interconnects of up to 30 metres using technology it originally developed for the optical module market. EE Times Europe's John Walko gets the story. Click here for details.

    Puzzled by a network processing design issue?

    Join former NPF CEO Colin Mick in discussing net processing design issues by clicking here!


    EE Times TechCareers
    Search Jobs

    Enter Keyword(s):


    Function:


    State:
      

    Post Your Resume
    -----------------
    Employers Area
    Most Recent Posts
    Boeing seeking Embedded Software Engineer 5 in Huntington Beach, CA

    SEL seeking Lead DSP Engineer in Pullman, WA

    SEL seeking Power Systems Instructor in Pullman, WA

    Rutland Regional Medical seeking Server Engineer in Rutland, VT

    Osram Sylvania seeking Mechanical Design Engineer in Danvers, MA

    More career-related news, resources and job postings for technology professionals




    Home  |  Register  |  About  |  Feedback  |  Contact   |  Site Map
    All materials on this site Copyright © 2009 TechInsights, a Division of United Business Media LLC All rights reserved.
    Privacy Statement ¦ Terms of Service