Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec




















eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Green SupplyLine
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • RFID-World
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


16 March 2010

Feature

Smart Antenna Schemes For E-911


By Matthew Plonski

With 2001 quickly approaching, designers of wireless systems are faced with meeting an FCC mandate for E-911 functionality. By combining smart antenna technology with direction-of-arrival (DOA) algorithms, engineers can develop systems that provide accurate location information.

Enhanced-911 (E-911) technology is on the minds of most wireless designers today. With 2001 just around the corner, the FCC-mandated rule to include E-911 functionality in all wireless products is drawing near. Therefore, engineers need new technology solutions to bring E-911 capabilities to their system designs.

During the past year, smart antennas have emerged as a main technology for delivering E-911 functionality to next-generation wireless networks. These antenna systems are powerful solutions since they can exploit the spatial and spectral characteristics of signals, providing very accurate location information. Smart antenna systems, which require complex digital signal processing (DSP), will employ DOA algorithms to this information. The challenge for the designer, however, is choosing the right DOA algorithm.

Across the board, there are a variety of DOA algorithms under development for use in smart antenna systems targeted at position-location applications.1, 2 This article explores one of these algorithms ý the multiple-signal-classification (MUSIC) algorithm. Additionally, the DSP requirements of smart antenna systems employing the MUSIC algorithm will be presented.

Antenna array


A smart antenna array, for this discussion, consists of several omnidirectional antenna elements dispersed in a known physical pattern. The electromagnetic (EM) waves incident on the antenna array are represented as plane waves with various directions of arrival.

By using this type of antenna array, near-identical signals are re-ceived by each antenna element, a combination of the incident signals and any ambient noise. Each incident plane wave is received by the individual antenna elements at a different time. The plane waves are all somewhat out of phase, as determined by the DOA and the array geometry. A direct sum of all the received signals will result in some signals being canceled (those out of phase), with a very few contributing positively.

It is possible to apply a phase delay to the signals received at each antenna element, tuning the antenna array to optimally receive from a particular DOA. This concept is sometimes referred to as beamforming, since this approach produces an antenna sensitivity lobe (or beam, in the case of transmission) in the DOA (see Figure 1). The array's sensitivity or overall power output would be increased by a factor equal to the number of antenna elements. The length-M set of phase delays applied to a particular antenna array that causes it to point a beam in a particular direction is known as a steering vector.

The results obtained by using a DOA algorithm may be employed for E-911 position location or as input to a system interested in increasing the signal-to-noise ratio (SNR) in the direction of a particular user. The system can simply take the user's DOA and tune the array to improve reception.

MUSIC


The MUSIC algorithm is one of the most researched DOA algorithms due to its interesting breakdown of the principal components of input signals (see Figure 2). The input signals provide information about the DOA of the received plane waves as well as the noise received at each element. Using the algorithm, engineers can obtain multiple delayed versions of the plane waves and the antenna array geometry. This makes it possible to exploit the spatial and temporal correlation between the different received signals to determine the angles of arrival.

Performing principal-component analysis on the input signals breaks the data down into a basis, or set, of M-dimensional vectors capable of describing the statistical relationships between all of the signals. The core concept of the MUSIC algorithm is that these vectors (called Eigenvectors) can be divided into two subsets, one providing information about the correlated plane waves (signal space) and the other containing information derived from the uncorrelated noise (noise space).

The Eigenvectors in the noise space are orthogonal to those in the signal space. As the signal space contains information about the angles of arrival from each plane wave, the steering vectors from those angles are also orthogonal to the vectors in the noise space. The magnitude of the product between a steering vector from a plane wave's DOA and the noise-space matrix is zero. The inverse of the magnitude of the product between a steering vector from all possible angles and the noise-space matrix is known as the MUSIC spectrum.

Implementation


The purpose of the MUSIC algorithm is to derive the DOA from a number of sources incident on a smart antenna array. This frame-based process (performed on a number of samples at a time, known as a frame, instead of being computed continuously) begins by demodulating the signals received at each antenna element and producing a buffer of complex-valued samples. The samples are complex so that the phase information remains intact throughout the process. The frames are then transferred to the system that will run the MUSIC algorithm.

The first step in the computation of the MUSIC algorithm is to compute the signal covariance matrix, which is one of the more compute-intensive sections (see Figure 3). Each of the frame buffers is arranged into a vector containing K samples per frame, and there are M total frame buffers (one for each antenna element in the array). The covariance matrix is formed by multiplying each of the M frame buffer vectors by every offer frame buffer vector (taking the scalar product). Each vector multiplication takes K complex multiply-accumulates, and there are M2 vector multiplications, for a total of M2K necessary complex operations. Due to the symmetry of the problem it's possible to reduce this by almost a factor of 2, but this is still a massive number of computations.

The second step in the MUSIC algorithm is to compute the Eigen decomposition of the covariance matrix, and a typical method is the use of householder transformations followed by Givens rotations and backward accumulation. Although there are several possible approximations and alternate methods, most are still on the order of M3 complex operations.

From the Eigenvectors and Eigenvalues of the covariance matrix, the MUSIC spectrum is formed and then evaluated at a particular angular resolution. For D incident signals and an angular resolution of 2p/L, this results in M2(M ý D)L more complex operations.

For reasonable system estimates, such as M = 12 antenna elements, K = 128 samples per frame, D = 6 incident signals, 2p/L = 360 angular resolution, and a frame rate of 1,000 Hz, the total comes in at approximately 3,000 millions of instructions per second (MIPS).

Spreading the load


The number of operations involved in computing a typical DOA algorithm is massive, often so large it prohibits implementation on even the fastest traditional DSP architectures. Algorithm designers have been forced to come up with new solutions, or new applications of old solutions. One of those solutions that has recently come into favor is the concept of parallel computing.

Not a new technique by any means, today's DSP programmers are often forced to "throw horsepower at the problem" by adding multiple identical processors as a method of reducing overall cycle count. In a world where real-time processing is mandatory, cost and board space is usually less important. Less important, that is, until passed along to the consumer.

Designers may ask how parallel computing helps the implementation of a DOA algorithm. Ideally, an engineer should be able to reduce the number of cycles required by a factor equal to the total number of processors used. Unfortunately, this is rarely the case due to the large amount of overhead inherent to parallel computing. The most prominent factors of this are the need for interprocessor communication and the transferal of intermediate results.

For example, in the computation of the covariance matrix for a 12-element, 128-sample-per-frame system, 78 (due to Hermitian symmetry) length-128 complex dot products are required. In a four-processor system, it is possible to reduce this number to 78*4 = 19.5 (rounds up to 20) of the same dot products. This is a considerable savings. However, this reduction requires that the entire input frame (a 12-by-128 matrix of complex samples) be distributed to each of the four processors, causing a significant amount of overhead. Additionally, the subsequent step of the MUSIC algorithm also demands that the complete covariance matrix be distributed throughout the multiple processors.

In order to perform the householder transformations, which is the initial step of modifying a covariance matrix into tri-diagonal form, it is necessary to derive length-M householder vectors and transfer them to each processor for every one of the M2 iterations. The result is the communication of M3 intermediate results per frame. A parallel implementation of Givens rotations, the process of modifying a tri-diagonal matrix into a pure-diagonal matrix with Eigenvalues along the main diagonal, and the backward computation of the Eigenvectors is very efficient. It does not require any intermediate communications as long as the size-M2 tri-diagonal matrix is present on all processors. Finally, the compute-intensive process of evaluating the MUSIC spectrum can also be distributed among several processors efficiently, but only after the (M ý D) length-M Eigenvectors have been broadcasted to all processors.

Careful algorithm design can reduce data transference, but not eliminate it. If it was possible to completely divide all operations and allow them to reside in their own separate processors without the need for any information from another, that would be the best-case situation. This interprocessor communication is exhausting, especially when one considers what is involved with each transfer.

Interprocessor communication


Traditionally, multiprocessor systems were limited to various interrupt-driven techniques in the methods of interprocessor communication (see Figure 4). These methods were usually of the following sort:

  • Processor A prepares a message for Processor B. (Puts it in shared memory, configures DMA, and so on.)
  • Processor A signals an interrupt on Processor B.
  • A continues its processing or waits for B.
  • Processor B services the interrupt from A.
  • B processes the data from A.

The disadvantage of a distributed communication scheme is that it is very difficult to predict the transfer times. Interrupt latency and processor bus timings are only a few of the factors that must be taken into account. Scheduling among the different processors becomes an extremely difficult problem, oftentimes eclipsing the original algorithm.

A new improvement in interprocessor communications is the organization of the multiple-DSP system into clusters (see Figure 5).4 A cluster provides complete connectivity between the attached processors, enabling data transfer between the processors in a deterministic fashion. Data is routed on cycle-by-cycle terms. There is no need to schedule or even worry about interrupts. The transfer cycle is simple. First, processor A executes a single-cycle instruction, sending processors B and C some data. When the next cycle arrives, processors B and C already hold the necessary data.

A clustered system enables the DSP algorithm programmer to design very tight, parallel code. In an SIMD parallel implementation, intermediate results can be communicated in a single cycle, so there is no down time or latency. In the example of the MUSIC DOA algorithm, the communication of intermediate results would be a scheduling nightmare for a traditional multiprocessor system. In a clustered system, however, the intermediate results are sent and received in a known number of cycles, eliminating the need for any scheduling or interrupt handling. In fact, each transfer between tasks in Figure 3 can be performed in a single cycle, making the communication much less than the order of operations from the compute sections.

More powerful architectures


Obviously, the following generations of communication signal processing algorithms are going to require DSP manufacturers to come up with more-powerful architectures. Multiprocessor systems are going to become mandatory, and scheduling of data-transfer operations may begin to eclipse the complexity of the original algorithms. In the case of the MUSIC algorithm, interprocessor communication can be virtually eliminated through the use of a well-designed data-exchange architecture such as a clustered system.

The DSP industry is striving to provide higher levels of performance to enable compute-intensive algorithms such as the MUSIC algorithm for smart antenna systems. All DSP architectures available in the marketplace will benefit from the faster clock speeds enabled by smaller process geometries. Many suppliers have introduced new products during the past few years that validate the need for parallelism in addition to faster clocks. Examples are the very long instruction word (VLIW) and superscalar architectures. While these incorporate instruction-level parallelism, the highest-performance architectures must incorporate all levels of parallelism ý instruction, packed data, and multiprocessing. Architectures that can deliver all of these with simple-to-use programming tools will deliver the best levels of performance.

Illustrations

Matthew Plonski is a principal DSP engineer at BOPS, Inc. He holds a BSEE and MSEE from Rochester Institute of Technology in Rochester, NY. He can be reached at mplonski@bops.com..

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5


References
  1. Liberti, J. C. Jr., and Rappaport, T.S.,Smart Antennas For Wireless Communications, Prentice Hall, Englewood Cliffs, N.J., 1997, p.319
  2. Stoica, P, and. Moses, R., Introduction to Spectral Analysis, Prentice Hall, Englewood Cliffs, NJ, 1997, p. 319.
  3. Golub, G.H., and van Loan, C.F., Matrix Computations, The John Hopkins University Press, Baltimore, MD, 1996, p. 694.
  4. Pechanek,G., et al, "The ManArray Story," http://bopsnet.com/cores_MAstory.shtml, 1999.


Return to the Table of Contents





Virtualab

  • Analysts: Five observations on mobile from MWC
  • M'soft says no comment on Project Pink phone
  • What made you become an EE? Join the Conversation
  • Nvidia blames sales shortfall on TSMC
  • MORE
    Prototype fuel cell for handsets eyes fivefold run-time boost
    As part of a research collaboration on miniaturized energy sources, the French Atomic Energy Agency (CEA) and STMicroelectronics NV (Geneva) have prototyped a hydrogen fuel cell for mobile phones that aims to reduce dependency on the use of electrical power supplies to recharge batteries. EE Times' Anne-Francoise Pele Takes a closer look.Click here to learn more.

    Tech Article Library
    Check out CommsDesign's Design corner to find a detail technical articles on a host of communication design issues. To access the design corner, click here.

    Phyworks demos 10G copper interconnects
    Communications chip specialist Phyworks (Bristol, England) has demonstrated 10Gbits/s rack-to-rack copper interconnects of up to 30 metres using technology it originally developed for the optical module market. EE Times Europe's John Walko gets the story. Click here for details.

    Puzzled by a network processing design issue?

    Join former NPF CEO Colin Mick in discussing net processing design issues by clicking here!


    EE Times TechCareers
    Search Jobs

    Enter Keyword(s):


    Function:


    State:
      

    Post Your Resume
    -----------------
    Employers Area
    Most Recent Posts
    Accenture seeking Project Management Team Lead in Charlotte, NC

    Accenture seeking Software Engineer in Salt Lake City, UT

    Boeing Company seeking Software Engineer in Herndon, VA

    Switch and Data seeking Customer Solutions Engineer in Dallas, TX

    Chart Industries seeking Sr. Developer in Cleveland, OH

    More career-related news, resources and job postings for technology professionals




    Home  |  Register  |  About  |  Feedback  |  Contact   |  Site Map
    All materials on this site Copyright © 2010 EE Times Group, a Division of United Business Media LLC All rights reserved.
    Privacy Statement ¦ Terms of Service