It's time for a re-education about compilers, those critical tools that tell your code how to work with the hardware. Contrary to popular belief, the options provided by a DSPvendor aren't the only ones.
In some cases they are the best, but with a few simple considerations engineers can determine the optimal approach for creating the most efficient design for an application.
For many, choosing a DSP C compiler is a pre-made decision. By choosing the DSP processor, the choice is reduced to a small number of options. The choice comes down to the C
compiler offered by the DSP hardware vendor and one or two third-party suppliers. But, the engineering quality of C compilers offered by the supplier and independent
vendors is becoming one of the critical decision factors in picking a DSP.
Savvy compiler providers have a large number of tools at their disposal to make their compilers perform well on benchmark programs. Unfortunately, these benchmark
optimizations often do not apply to real-world design. Often, the discovery of less than anticipated performance comes at the end of a project - just when the need for
performance margin is greatest.
If compilers can be manipulated to score well on benchmark programs, how can a designer discover the truth about a compiler before it's too late? The answer is to employ
modified versions of standardized benchmarks and programs that are similar in nature to the intended application. It is important to run these programs through the compiler
with several different sets of optimization switch settings to better understand the effects of optimization switches and their use in various combinations.
Three parts
Compilers have three logical parts: syntactic analysis, instruction set independent optimization, and code generation combined with machine specific optimization.
Syntactic analysis is the part of the compiler that scans the source code and recognizes the constituent parts of the source such as keywords, reserved words, and other
syntactical elements. This front end normally doesn't need to be tested, providing the compiler vendor is an established company in the compiler business.
Regardless of the vendor, compilers normally accept stand-ard-defined code without problems. Because syntactical analyzers are automatically generated by compiler
generation tools, the syntax analysis will be correct. The standardized compiler test suites used by compiler vendors all but assure the correct functioning of the compiler for
mainstream functionality including the compiled code.
The true capabilities of a good compiler come into play when a program's syntax is incorrect. Better, more mature and robust compilers have extensive error recovery built
in to aid the programmer in debugging C programs when a syntax error has been made. Good performance on error recovery can save substantial programming time by
avoiding the need to run a series of compiles to discover simple problems of program construction like missing brackets, misspelled keywords, and improper punctuation.
Poor error handling can cause many unnecessary recompiles, each of which consumes programming time and injects more critical path time into engineering schedules - all
to find errors of form and not substance.
Experience with the syntax-analysis phase of the compiler pays off in error recovery. One can estimate this function's robustness by determining the number of compiler
years in deployed systems. To determine the number of compiler-years, multiply the number of compilers in use by the average number of years that the compiler or its
derivatives has been in service. As a first approximation you can use the number of different processors that the company targets with compilers based on the same front end.
Greater numbers of targeted processors when combined with long in-service times bode well for the quality of the syntax analysis portion of the compiler. This information is
readily available from most compiler vendors. In fact, most DSP vendors' compilers are targeted to a few processors and for relatively short periods of time. However, DSP
vendors often have an advantage in other areas of compiler performance such as optimization.
There are two types of optimization used in most compilers: machine independent and machine-specific. Although the machine dependent optimizations seem to be the most
important, it's important to understand how this concept works in general - since it is often in this phase that DSP compilers based on non-DSP compiler instructions have
problems.
Independent optimization
There are many techniques that can be applied to machine-independent optimization. The simplest is code movement. Consider the following fragment of pseudo code:
Do {
X=1
Y=z+q
Until y>23
Code movement optimization will consider constant expressions like "X=1" as something that can be moved without changing the meaning of the code. In this technique, the
above code becomes:
X=1
Do y=z+q
Until y>23
The optimized code will execute faster because the constant assignment is only repeated once for the entire loop rather than every time. But in some cases this motion is not
desired. Suppose for a moment that assigning 1 to X really resets a watchdog timer. In this case, code movement will cause the program to fail when the watchdog timer times
out. Here, the generic optimization that is safe to perform in a data processing environment, fails in the embedded world of DSP. In general, this is a problem that compiler
vendors for microcontrollers have had to deal with for years.
Some compilers provide switches for the programmer to use that prevent the compiler from moving code around within the program. And some compiler systems permit you
to compile one module of a program with one set of optimizations, and another with a different set of optimizations. Still, code movement is a necessary feature in order to
produce reasonable code.
Another common machine-independent optimization is the elimination of dead operators and related equivalent code. Consider the following code:
A=b(i)*c(i)
A=(b(i+1)*c(i+1)
Most compilers will analyze this code and determine that the first line is non-functional and eliminate it. But when considered from a pipelined operation view, the code may
make perfect sense since it keeps the pipeline full.
And results may be available as implied operators. In this case, the optimization would change the intended operation of the compiled code.
Likewise, the entire architectural basis for many DSPs causes generic problems for many code optimization strategies. Most DSPs take as their architectural touchstone the
convolutional operator:
sum(a(i)*b(i)) for I=1,n.
In this case it is the C language itself that causes problems. Many DSPs contain a multifunction unit that can perform a single cycle multiply-accumulate and update of two
address pointers, and most also permit zero overhead looping constructs that encompass a number of instructions. C lacks a program structure that represents this detail at a
high enough level to permit mapping the operation directly onto the hardware.
Various alternatives to standard C have been offered including numerical C and macro-defined operations within C. Most vendors have opted to use proprietary extensions in
these special cases. The upside to this approach is that the underlying hardware is well supported. The downside is that portability is sacrificed in favor of control over code
generation so that the specialized hardware can be adequately exploited.
Most often, DSP compiler vendors shrug and observe that the tightest inner loops of DSP code will be hand coded anyway, so the compilers offer the ability to drop into
assembly language when desired.
Benchmark games
Machine-independent optimization is where many benchmark games are played. For example, in hand-coded benchmarks specific standards are defied so that competitive
comparisons are made between specific benchmark programs that have been developed to like standards.
One of the historical games played by DSP vendors in hand-coded benchmarks in the mid-1980s was to perform manual loop unrolling. Loop unrolling is best understood by an
example:
Do {
C=C+(a(i)*b(i))
Until I=n
The obvious way to code this in the typical DSP is to use the zero overhead loop construct and implement it in a single instruction. But, for some DSP architectures and sizes of
loop in repetition count/code size, the fastest code is not generated by the single "obvious" instruction. Instead, the code is "unrolled" into straight line code:
C=C+a(1)*b(1)
C=C+a(2)*b(2)
C=C+a(3)*b(3)
C=C+a(4)*b(4)
.
.
.
C=C+a(n)*b(n)
For many processors this unrolled loop executes faster by virtue of having no additional registers to be set up. But this type of speed comes at the expense of code size. In one
example of this type of optimization, a vendor had completely unrolled critical code so that there was no memory left for any other function. The theoretical performance was
just that, theoretical.
This is why companies have established benchmarking standards.
Compilers sometimes employ this trick to great effect. For data processing applications, where memory size is not a constraint, the technique is valuable and works. For
embedded DSP applications, the use of unrolling and other machine-independent optimizations may be of little value.
When evaluating a compiler for performance characteristics, most of these optimization techniques can be discovered by asking the vendor or by visually inspecting the
assembly language produced by the compiler. It is easy to write a small test program that includes apparently constant variable assignment, dead code, and loops of the type
that you will commonly use in your DSP programs. Be sure to compile the test program with as many optimization options as the compiler supports, including combinations.
Dependent optimizations
There are a wide variety of techniques used by compiler companies to optimize code for a specific DSP processor. Often the best back-end optimizations are developed by the
DSP vendors themselves. There are however, exceptions. VLIW processors live and die by the strength of their code optimization. DSPs employing VLIW architectures may
have four or more execution units, each with its own set of capabilities. This situation is so complex that even above average programmers cannot write optimum code by
themselves. This is because the code must be scheduled for execution at compile time so that needed results are ready at the right place and time; further optimizations may not
be obvious.
The technology for VLIW DSPs is still in its infancy and the underlying theory of VLIW is fairly new and unexplored as well. Consequently there is no substitution for this type
of compiler but to review compiled test code for reasonableness.
The integrated development environment (IDE) can be a blessing and a curse. When the IDE fits with your development standards it's a boon. It's a curse because the compiler
only works within the environment. In the case of the IDE not fitting to the environment, the only choice is another compiler or adopting new standards.
Choosing a C compiler can be dictated by the choice of processor; but there are alternative choices. Wading through the hype is made easier by some simple tests. In the final
analysis one factor can be the deciding issue - do you want a single supplier of technology? If the answer is no, you have your work cut out for you.
|
Vendor Listing
|
|
ACE
Van Eeghenstraat 100, 1071 GL
Amsterdam, The Netherlands
www.ace.nl
Agere Systems
555 Union Blvd.
Allentown, PA 18109
www.agere.com
Analog Devices
One Technology Way
Norwood, MA 02062
www.analog.com
ARC Cores
Waterfront Business Park
Elstree Road
Elstree Herts WD6 3BS
www.arccores.com
Aspex
2103 Landings Drive
Mountain View, CA 94043
www.aspex.co.uk
Bops
1200 Charleston Road
Mountain View, CA 94043
www.bops.com
DSPecialists
Rotherstrae 22, Friedrichshain 10245
Berlin, Germany
www.dspecialists.com
|
DSP Group
3120 Scott Boulevard
Santa Clara, CA 95054
www.dspg.com
Green Hills Software
30 West Sola Street
Santa Barbara, CA 93101
www.ghs.com
Infineon
St.-Martin-Str. 53 81541
Munchen, Germany
www.infineon.com
Intel
2200 Mission College Blvd.
Santa Clara, CA 95052
www.intel.com
LSI Logic
1551 McCarthy Blvd
Milpitas, CA 95035
www.lsilogic.com
Metaware
2161 Delaware Avenue
Santa Cruz, CA 95060
www.metaware.com
|
Metrowerks
9801 Metric Blvd.
Austin, TX 78758
www.metrowerks.com
Motorola
2900 South Diablo Way
Tempe, AZ 85282
www.motorola.com
Philips Semiconductors
5611 EE Postbus 930
5600 AX Eindhoven, The Netherlands
www.philips.semiconductors.com
Star Core
2100 Riveredge Pkwy, 600
Atlanta, GA 30328
www.starcore-dsp.com
Target Compiler Technologies
Hassrode Research Park
Interleuvenaan 3 B-3001 Leuven, Belgium
www.retarget.com
Tasking
333 Elm Street
Dedham, MA 02026-4530
www.tasking.com
Texas Instruments
12203 Southwest Freeway
Stafford, TX 77477
www.ti.com
|
|
DSP Compiler Checklist
|
|
While there are a great variety of tests that can be done on a C compiler, a small set of factors can help determine which compiler is the one for you.
- DSP vendor or third party?
- How many copies of the compilers in use?
- How long have they been in use?
- Is it a family of compilers or one off?
- Programmer control over code optimization:
Loop unrolling?
Dead code elimination?
Code motion?
Other optimizations?
- Code generation:
Assembly source produced or available?
Non-optimizing for debug?
Source debug facility?
Support for emulator?
Support for core simulation of a core is used?
- Source of the basic compiler:
In-house?
GNU C compiler (GCC)?
Compiler house?
- Experience with DSP?
- Interoperate with other tools?
Which ones?
|
Henry Davis is president of Henry Davis Consulting, a new products consultancy based in Soquel, CA. He holds a BS in computer science and business administration from
Columbia Pacific University, and has done graduate work at the New Mexico Institute of Mining and Technology. He can be reached at hdavis@ix.netcom.com.