Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec


















Audio Designline



eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Green SupplyLine
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • RFID-World
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


06 October 2008



Improving VoIP call quality with Embedded Monitoring

By Alan Clark
CommsDesign
Sep 21, 2001
Print This Story Send As Email Reprints
 
By embedding passive measurement techniques that account for the bursty packet loss and human perceptual qualities in gateway and handset designs, engineers can help network operators better understand the nature of the packet loss bursts, the true quality of service and the end-user experience, in real time and without adding to the network load, problems, and other hindrances to VoIP network performance.

Voice-over-IP (VoIP) networks differ from conventional telephone networks in that voice quality on VoIP is affected by a wider variety of network impairments and can vary from call to call and over the course of a call. In conventional telephone networks dedicated circuits are nailed down allowing high and consistent quality. But this causes a considerable waste of bandwidth as a large percentage of the call time is filled with silence.

In VoIP networks, circuits are not dedicated but shared or multiplexed between many users and potentially many different applications such as voice, email, and file transfer. Silence periods, periods with no voice packets being transmitted, in one call frees up bandwidth to be used by other calls sharing that virtual circuit thus amortizing the cost of the circuits over many users and many conversations.

It is therefore necessary to monitor per call quality in VoIP networks in order understand service quality levels and take corrective action when call quality degrades below acceptable or specified levels. Corrective action may be taken to improve the quality of the current or subsequent calls. This has the positive affects of increasing customer satisfaction, retention, and call duration.

Having an objective understanding of call quality also allows service providers and enterprises to properly provision networks, ensure that network resources are properly allocated, and potentially charge customers or charge-back users based upon the QoS received, i.e. differentiated services with differentiated billing.

So how can we make this monitoring a reality? The answer lies in embedding lightweight passive monitoring agents into VoIP gateway and handset designs. By doing this, engineers can provide insight into the effects of time-varying-impairments, more easily track down problems in IP links, and optimize equipment designs to better handle VoIP traffic without affecting overall system performance.

Now you're asking, how does it work? Let's dive into that further.

Active vs. passive

One of the fundamental questions when embedding monitoring capabilities into a system architecture is whether to choose active or passive monitoring techniques. Let's compare the two.

Under the active monitoring approach, test data is inserted into the system to obtain performance measurements. Active monitoring systems typically make test calls, transmit speech files and compare transmitted and received files using the perceptual speech quality measurement (PSQM) method, the perceptual evaluation of speech quality (PESQ) method or some similar method. These methods tend to be computationally burdensome and thus non-real-time.

A key advantage of the active approach is that codec performance can be directly measured. The problem, however, is that active monitoring only gets a snapshot of system performance by monitoring synthetic calls or average calls not "real" calls. Additionally, by adding synthetic calls on the network, active measurement techniques can exacerbate conditions being tested by increasing load on the network. This tends to make active measuring techniques more suitable for lab or prototype environments for capacity planning type activities.

Passive monitoring systems, on the other hand, examine operating characteristics of a system in order to measure performance levels. This may involve examining elements of the system, for example buffer levels, or unobtrusively examining the data stream being transmitted through the system.

Embedded passive monitoring systems employ some form of monitoring agent embedded into the system under test. This has the advantage of a closer relationship with system elements, access to real-time data and control information and not requiring the purchase and management of another entity on the network.

For VoIP, passive monitoring agents can be integrated directly into VoIP gateways or IP phones during the development process. These scenarios provide access to codec and more accurate packet loss and delay information. This permits per-call estimates of transmission quality to be made with negligible impact on the service being monitored.

The agents can also be embedded into testers and probes. Under this approach, the agent can be equipped with a jitter-buffer simulator, possibly selectable or configurable according to particular jitter-buffer profiles. The jitter-buffer simulator allows a close approximation of the end point environment and therefore makes probe-based measurements considerably more accurate.

Using the E-Model to track quality

So now that we know more about passive monitoring, let's explore how it works. Let's start by examining the quality of the transmission channel.

In a VoIP system, the quality of the transmission channel plays a huge part in the overall effectiveness of a VoIP network. Therefore, it is essential for the passive monitoring agent to track the performance of this channel. A common and standardized method of measuring the voice transmission quality is ETSI Technical Report ETR-250 (E-model), making the E-model a solid foundation point for the monitoring technique.

The E model, which actually is comprised of several models that represent specific impairments and their interactions in calculating performance, is a well-established transmission quality model that was originally developed for the PSTN. It provides an objective method of assessing the mouth-to-ear (acoustic-to-acoustic) transmission quality of a PSTN telephone connection. It is intended to assist telecom service providers with PSTN network planning and performance testing. While other models exist, uniquely, the E-model employs an equipment impairment factor, Ie, which incorporates the notion of packet loss. It is this explicit notion of packet loss that makes the E-model particularly useful for IP Telephony. We will return to packet loss in more detail later.

The E model, described in ETSI Technical Report ETR 250 and in ITU Recommendations G.107 and G.108 is:

R = Ro --Is -- Id -- Ie + A, producing an R factor rating between 0 and 100. where:

Ro - effects of noise and loudness ratio Is - effects of impairments simultaneous with the speech signal Id - effects of impairments delayed with respect to the speech signal Ie - effects of equipment such as digital circuit multiplication equipment (DCME) or VoIP networks A - advantage factor, quantifying the allowance users give for lesser quality when given some other benefit such as the mobility offered by cellular phones.

The equipment impairment factor, Ie, is generally used to represent the impairment effects of VoIP equipment. Certain codecs have already been formally characterized through subjective testing to give a standard profile of the variation of Ie with packet loss (see Figure 1).

Time-varying impairments

In addition to implementing the basic E-model, it is also extremely important for the passive monitoring technique to tackle packet loss burst distribution. By far bursty packet loss is the network impairment with greatest effect on voice quality.

Packet loss occurs for several reasons: buffers may have overflowed within the network, packets may have been intentionally discarded due to the congestion control scheme employed (such as weighted random early detection) or network transmission errors.

Several of the mechanisms that can lead to packet loss are transient in nature. Hence, the resulting packet loss is bursty in nature. The bursty nature of packet loss requires the implementation of a packet loss distribution model rather than a simple set of packet loss counters or rate statistics to accurately assess its call quality.

For example, averaging packet loss over the course of a 2-minute call may lead a network manager to believe that there was no quality degradation, whereas if the same number of lost packets actually occurred in a few 1- or 2-second intervals, causing imperceptible speech, then call quality would be regarded as very poor.

A 1999 study examined the distribution of packet loss in the Internet and concluded that it could be accurately represented by a Markovian loss model. 1 The Markov model also provides a computationally efficient framework for modeling packet loss distribution due to its memory-less properties. It is therefore useful for a monitoring agent, requiring high accuracy and low CPU cycle overhead, to implement its packet loss distribution model as Markovian chain. The agent would work by calculating the appropriate statistics per state and the state transition probabilities.

If the rate of packet loss varies during a VoIP call then the end-user-perceived call quality will also vary. The term "instantaneous quality" may be used to denote the measured quality due to packet loss or other impairments and the term "end-user-perceived" quality may be used to denote the quality that the user would report at some instant in time.

If instantaneous quality transitions from good to bad at some instant, the listener would not immediately notice the change. It takes some time and persistence before the transition becomes recognizable. As the problem persists, the user would become progressively more annoyed or distracted by the impairment, eventually terminating the call. This leads to the idea that the human perceived quality changes more slowly than instantaneous network quality.

Can we forget jitter?

Not really. Jitter (or delay variation) also has an effect on call quality. However, a jitter buffer generally transforms the jitter problem into delay and packet loss problems. Jitter buffers are often adaptive and adjust their depth dynamically based on either the current packet discard rate or current jitter level. To prevent significant packet delay variation or jitter, incoming packets are buffered and then read out at a constant rate. If packets are excessively late in arriving, the jitter buffer discards them.

Because the jitter buffer can add to the packet loss experienced by the network, it is advisable and more accurate to measure packet loss (or rather frame loss) after the jitter buffer but before the codec. In this manner, all lost packets are accounted for.

Regarding recency

In a study of the relationship between instantaneous and overall subjective speech for time-varying quality of sequences and the influence of a recency effect, the packet loss rate during a 3-minute call was varied from 0 to 25%.2 For example, the packet loss was set to 25% for most of the call and reduced to 0% for a 30-second period mid-call. Listeners were asked to move a slider to indicate their assessment of quality during the call and then were asked to rate the overall call at its end.

The results were a good illustration of the transition effect (see Figure 2). Transitions plot along an approximately exponential curve with a time constant of 5 seconds for the good-to-bad transition and 15 seconds for the bad-to-good transition. In order to accurately ascertain call quality the monitoring agent must also take this transition effect into account.

The "recency" effect reflects how a listener would remember call quality. In tests, a 15-second burst of noise was inserted and moved from beginning to middle and then to the end of a 60-second call. 3 When the noise burst occurred at the start of the call users reported a mean opinion score (MOS) of 3.82. When the noise burst occurred at the end of the call, users reported a MOS of 3.18. The resulting difference in MOS of 0.64 shows a 20% quality rating effect depending on where the noise burst occurred. This recency effect is even more significant when considering the typical range for MOS is 2.5 to 4.0, resulting in 40% impact over that range.

This recency effect is believed due to the tendency for people to remember the most recent events and/or to the way auditory memory functions, which typically decays the recollection over a 30-second interval.4 Obviously, the effects of recency must also be considered in the model to arrive at an accurate end-user-perceived call quality assessment.

Extending the model

In many VoIP implementations the connection between the codec and the telephone handset may be transient. For example a user may dial through the local loop and be routed to a gateway located at the CO. This means that some E model elements may not be measurable from within the network. To solve this problem standard default values for many of the E model parameters can be assumed per G.107. Using the default values, the E model can then be represented as: R = 94 -- Id -- Ie.

The average value of Ie may be determined by taking the average of the end-user-perceived quality for the call. For each time interval, the instantaneous quality can determined by measuring the post-jitter buffer packet loss, mapping the packet loss to an Ie value using the curves in Figure 1.

The end-user-perceived quality can be estimated from the instantaneous quality by assuming the exponential decay and modeling the effect described above. A time constant of 5 seconds can be assumed for deterioration in quality and 15 seconds for an improvement in quality.

The recency effect can be modeled by assuming that end-user-perceived quality decays exponentially with time constant from the "exit" value of a burst of noise or distortion tending towards the average equipment impairment factor, Ie.

Modeling packet loss

To meet the real-time monitoring requirements within a VoIP gateway, it is essential to minimize processing overhead. One approach is to obtain some minimal amount of information during a call and perform most computation at the call's end. This approach is derived from understanding the need to incorporate the effects of recency.

As noted earlier, a Markov model can be used to represent burst packet loss characteristics. The states can represent the conditions of receiving or losing a packet within burst or "gap" conditions. A gap state is defined by the number of successive packets that must be received.

This model is similar to the Gilbert or Elliott models, which are essentially 2 state Markov chains. However, it includes a state representing the loss of an isolated packet within a gap. The rationale for this stems from the fact that packet loss concealment (such as replay last packet) can mask the effects of isolated lost packets. A loss-event driven model can be used to count a minimum number of key transition events.

It is assumed that voice activity detection is being used and hence that packet loss events relate to packets actually containing speech energy. When the call is completed, remaining transition counts can be derived and then normalized to provide the desired probabilities. This model holds considerable information and can be used to determine average gap and burst size and density, and successive lost packet distribution.

Note, the R factor does not yet include the effects of delay or recency. It does, however, accurately reflect the effects of packet loss, jitter and codec type on transmission quality. We call this the "network R factor."

The effects of delay are well known and easily modeled.4 Delays of less than 175 ms have a small effect on conversational difficulty, whereas delays over 175 ms have a significant effect. A simple delay model based upon delays greater than or less than 175 ms can be used to compute Id.

Models of recency produce what we can call the "user R factor." This is intended to closely reflect the end-user's perception of quality and therefore takes into account both recency and delay metrics.

The result is a call quality model that provides 2 R factors: Network, which is focused on packet loss distribution, and User, which is focused on recency.

And the results are...

Some initial subjective comparison was made to validate the approach. An audio file was corrupted using a burst error process, which comprised a low loss state and a high loss state. The loss and state transition probabilities were selected randomly. A 10-ms packet size was used and packet loss concealment applied. Sets of five test files were created, and a group of six listeners used to rank the files from 1 (best) to 5 (worst). The ranking was compared with that predicted by the algorithm described above.

The results showed reasonable correlation with user ranking. In the one case were there was poor correlation, the locations of packet loss events were either during silence periods or during periods when the sound produced by the speaker was not changing significantly, for example during an extended "aaaah" sound.

Also, a comparison test between the approach described here and active test measurement techniques such as PSQM and PAMS was performed using the ranking tests of the type described above (see Figure 3). The graph shows the distance from the listener's quality assessment to that produced by the tools. In five of the six cases the approach described here was more in line with the human opinion.

The statistical modeling approach described represents a novel approach for gathering and a more accurate approach to measuring voice call quality. Its novelty derives from being an embedded passive agent that can be instantiated in VoIP end systems. It's increased accuracy derives from being implemented in the packet stream, between the jitter buffer and the codec, recognizing and calculating the impact of real packet loss volume, and incorporating the critically important effects of burst packet loss and recency on network and end-user-perceived call quality.

Alan Clark is the founder, president, and CTO of Telchemy, Inc. He was previously the CTO of Hayes, director of research and strategy for Dowty Communications, and system architect for British Telecom. He has a BSEE and a Ph.D. in information theory from Leicester Polytechnic, UK. He has nine granted patents and can be contacted at alan@telchemy.com.

References

  1. Bolot, J., Fosse-Parisis, S., Towsley, D., Adaptive FEC based Error Control for Interactive Audio in the Internet. Infocom 99.

  2. France Telecom Study of the relationship between instantaneous and overall subjective speech quality for time-varying quality speech sequences: influence of a recency effect. ITU Study Group 12 Contribution D.139, May 2000.

  3. Rosenbluth, J. H., Testing the Quality of Connections having Time Varying Impairments, Committee contribution T1A1.7/98-031.

  4. Britt, R., Armstrong, M., Voice Quality Recommendations for IP Telephony. Committee contribution TR41.1.2/00-05-004, May 2000. Baddeley,

  5. A., Human Memory, Allyn & Bacon, 1997.

  6. ETSI Speech Communication Quality for Mouth to Ear for 3.1 kHz Handset Telephony across Networks. Technical Report ETR250, 1996.

  7. ITU Recommendation G.107.

  8. ITU Recommendation G.108.




EE Times TechCareers
Search Jobs

Enter Keyword(s):


Function:


State:
  

Post Your Resume
-----------------
Employers Area
Most Recent Posts More career-related news, resources and job postings for technology professionals



Home  |  Register  |  About  |  Feedback  |  Contact   |  Site Map