In some cases they are the best, but with a few simple considerations engineers can determine the optimal approach for creating the most efficient design for an application.
For many, choosing a DSP C compiler is a pre-made decision. By choosing the DSP processor, the choice is reduced to a small number of options. The choice comes down to the C compiler offered by the DSP hardware vendor and one or two third-party suppliers. But, the engineering quality of C compilers offered by the supplier and independent vendors is becoming one of the critical decision factors in picking a DSP.
Savvy compiler providers have a large number of tools at their disposal to make their compilers perform well on benchmark programs. Unfortunately, these benchmark optimizations often do not apply to real-world design. Often, the discovery of less than anticipated performance comes at the end of a project - just when the need for performance margin is greatest.
If compilers can be manipulated to score well on benchmark programs, how can a designer discover the truth about a compiler before it's too late? The answer is to employ modified versions of standardized benchmarks and programs that are similar in nature to the intended application. It is important to run these programs through the compiler with several different sets of optimization switch settings to better understand the effects of optimization switches and their use in various combinations.
Three parts
Compilers have three logical parts: syntactic analysis, instruction set independent optimization, and code generation combined with machine specific optimization.
Syntactic analysis is the part of the compiler that scans the source code and recognizes the constituent parts of the source such as keywords, reserved words, and other syntactical elements. This front end normally doesn't need to be tested, providing the compiler vendor is an established company in the compiler business.
Regardless of the vendor, compilers normally accept stand-ard-defined code without problems. Because syntactical analyzers are automatically generated by compiler generation tools, the syntax analysis will be correct. The standardized compiler test suites used by compiler vendors all but assure the correct functioning of the compiler for mainstream functionality including the compiled code.
The true capabilities of a good compiler come into play when a program's syntax is incorrect. Better, more mature and robust compilers have extensive error recovery built in to aid the programmer in debugging C programs when a syntax error has been made. Good performance on error recovery can save substantial programming time by avoiding the need to run a series of compiles to discover simple problems of program construction like missing brackets, misspelled keywords, and improper punctuation. Poor error handling can cause many unnecessary recompiles, each of which consumes programming time and injects more critical path time into engineering schedules - all to find errors of form and not substance.
Experience with the syntax-analysis phase of the compiler pays off in error recovery. One can estimate this function's robustness by determining the number of compiler years in deployed systems. To determine the number of compiler-years, multiply the number of compilers in use by the average number of years that the compiler or its derivatives has been in service. As a first approximation you can use the number of different processors that the company targets with compilers based on the same front end.
Greater numbers of targeted processors when combined with long in-service times bode well for the quality of the syntax analysis portion of the compiler. This information is readily available from most compiler vendors. In fact, most DSP vendors' compilers are targeted to a few processors and for relatively short periods of time. However, DSP vendors often have an advantage in other areas of compiler performance such as optimization.
There are two types of optimization used in most compilers: machine independent and machine-specific. Although the machine dependent optimizations seem to be the most important, it's important to understand how this concept works in general - since it is often in this phase that DSP compilers based on non-DSP compiler instructions have problems.
Independent optimization
There are many techniques that can be applied to machine-independent optimization. The simplest is code movement. Consider the following fragment of pseudo code:
Do {
X=1
Y=z+q
Until y>23
Code movement optimization will consider constant expressions like "X=1" as something that can be moved without changing the meaning of the code. In this technique, the above code becomes:
X=1
Do y=z+q
Until y>23
The optimized code will execute faster because the constant assignment is only repeated once for the entire loop rather than every time. But in some cases this motion is not desired. Suppose for a moment that assigning 1 to X really resets a watchdog timer. In this case, code movement will cause the program to fail when the watchdog timer times out. Here, the generic optimization that is safe to perform in a data processing environment, fails in the embedded world of DSP. In general, this is a problem that compiler vendors for microcontrollers have had to deal with for years.
Some compilers provide switches for the programmer to use that prevent the compiler from moving code around within the program. And some compiler systems permit you to compile one module of a program with one set of optimizations, and another with a different set of optimizations. Still, code movement is a necessary feature in order to produce reasonable code.
Another common machine-independent optimization is the elimination of dead operators and related equivalent code. Consider the following code:
A=b(i)*c(i)
A=(b(i+1)*c(i+1)
Most compilers will analyze this code and determine that the first line is non-functional and eliminate it. But when considered from a pipelined operation view, the code may make perfect sense since it keeps the pipeline full.
And results may be available as implied operators. In this case, the optimization would change the intended operation of the compiled code.
Likewise, the entire architectural basis for many DSPs causes generic problems for many code optimization strategies. Most DSPs take as their architectural touchstone the convolutional operator:
sum(a(i)*b(i)) for I=1,n.
In this case it is the C language itself that causes problems. Many DSPs contain a multifunction unit that can perform a single cycle multiply-accumulate and update of two address pointers, and most also permit zero overhead looping constructs that encompass a number of instructions. C lacks a program structure that represents this detail at a high enough level to permit mapping the operation directly onto the hardware.
Various alternatives to standard C have been offered including numerical C and macro-defined operations within C. Most vendors have opted to use proprietary extensions in these special cases. The upside to this approach is that the underlying hardware is well supported. The downside is that portability is sacrificed in favor of control over code generation so that the specialized hardware can be adequately exploited.
Most often, DSP compiler vendors shrug and observe that the tightest inner loops of DSP code will be hand coded anyway, so the compilers offer the ability to drop into assembly language when desired.
Benchmark games
Machine-independent optimization is where many benchmark games are played. For example, in hand-coded benchmarks specific standards are defied so that competitive comparisons are made between specific benchmark programs that have been developed to like standards.
One of the historical games played by DSP vendors in hand-coded benchmarks in the mid-1980s was to perform manual loop unrolling. Loop unrolling is best understood by an example:
Do {
C=C+(a(i)*b(i))
Until I=n
The obvious way to code this in the typical DSP is to use the zero overhead loop construct and implement it in a single instruction. But, for some DSP architectures and sizes of loop in repetition count/code size, the fastest code is not generated by the single "obvious" instruction. Instead, the code is "unrolled" into straight line code:
C=C+a(1)*b(1)
C=C+a(2)*b(2)
C=C+a(3)*b(3)
C=C+a(4)*b(4)
.
.
.
C=C+a(n)*b(n)
For many processors this unrolled loop executes faster by virtue of having no additional registers to be set up. But this type of speed comes at the expense of code size. In one example of this type of optimization, a vendor had completely unrolled critical code so that there was no memory left for any other function. The theoretical performance was just that, theoretical.
This is why companies have established benchmarking standards.
Compilers sometimes employ this trick to great effect. For data processing applications, where memory size is not a constraint, the technique is valuable and works. For embedded DSP applications, the use of unrolling and other machine-independent optimizations may be of little value.
When evaluating a compiler for performance characteristics, most of these optimization techniques can be discovered by asking the vendor or by visually inspecting the assembly language produced by the compiler. It is easy to write a small test program that includes apparently constant variable assignment, dead code, and loops of the type that you will commonly use in your DSP programs. Be sure to compile the test program with as many optimization options as the compiler supports, including combinations.
Dependent optimizations
There are a wide variety of techniques used by compiler companies to optimize code for a specific DSP processor. Often the best back-end optimizations are developed by the DSP vendors themselves. There are however, exceptions. VLIW processors live and die by the strength of their code optimization. DSPs employing VLIW architectures may have four or more execution units, each with its own set of capabilities. This situation is so complex that even above average programmers cannot write optimum code by themselves. This is because the code must be scheduled for execution at compile time so that needed results are ready at the right place and time; further optimizations may not be obvious.
The technology for VLIW DSPs is still in its infancy and the underlying theory of VLIW is fairly new and unexplored as well. Consequently there is no substitution for this type of compiler but to review compiled test code for reasonableness.
The integrated development environment (IDE) can be a blessing and a curse. When the IDE fits with your development standards it's a boon. It's a curse because the compiler only works within the environment. In the case of the IDE not fitting to the environment, the only choice is another compiler or adopting new standards.
Choosing a C compiler can be dictated by the choice of processor; but there are alternative choices. Wading through the hype is made easier by some simple tests. In the final analysis one factor can be the deciding issue - do you want a single supplier of technology? If the answer is no, you have your work cut out for you.
|
| ||
|
ACE
Agere Systems
Analog Devices
ARC Cores
Aspex
Bops
DSPecialists
|
DSP Group
Green Hills Software
Infineon
Intel
LSI Logic
Metaware
|
Metrowerks
Motorola
Philips Semiconductors
Star Core
Target Compiler Technologies
Tasking
Texas Instruments
|
| DSP Compiler Checklist |
|
While there are a great variety of tests that can be done on a C compiler, a small set of factors can help determine which compiler is the one for you.
|
Henry Davis is president of Henry Davis Consulting, a new products consultancy based in Soquel, CA. He holds a BS in computer science and business administration from
Columbia Pacific University, and has done graduate work at the New Mexico Institute of Mining and Technology. He can be reached at hdavis@ix.netcom.com.
|
|
All materials on this site Copyright © 2010 EE Times Group, a Division of United Business Media LLC All rights reserved. Privacy Statement ¦ Terms of Service |