Systems-on-chips (SoCs) are at the heart of most electronic consumer appliances (mobile phones, digital cameras, camcorders, and so on) as well as professional IT equipment (such as Internet routers). These devices are no longer designed like early ASICs, which were based purely on custom logic; instead, they consist of an ever-increasing number of blocks of intellectual property (IP), purchased from different sources. The key components of these SoCs are the embedded processors, which require a piece of embedded software code to make the chip come to life. As a result, software development teams and their coworkers in the hardware department are compelled to work together to develop and verify all of the SoC hardware and software components in order to accommodate the ever shrinking development schedule and meet critical time-to-market goals.
This collaboration is now made easier by the advent of rapid prototyping products known as soft emulation prototype, which combine the advantages of a hardware emulator with those of a software development platform.
As a noteworthy example of the challenges involved in such a collaboration, consider the development of a low power consumption MPEG-4 decoder for wireless applications. To enable the viewing of film trailers on a mobile phone, it is necessary to design a small-scale MPEG-4 decoder that consumes little energy.
The design is based on a configurable Xtensa processor developed by Tensilica, with the decoder software based on a reference C code for the MPEG-4 standard known as MoMuSys, issued by OSI. Three hardware optimizations were added to decode an MPEG-4 stream: an iDCT (inverse discrete cosine transformation) accelerator; a set of instructions for vectors processing (SIMD), bit manipulation, and extraction (bitstream); and YCbCr and RGB color format conversion.
Conventional Approaches
Historically, the most flexible method to develop a piece of embedded software code was based on an instruction set simulator (ISS). The code for the target processor is executed by a program running on a workstation. Simulating at a rate of a hundred thousand to a million instructions per second, an ISS would therefore be capable of decoding an entire movie trailer in less than an hour. Software developers are used to this level of performance, and find it difficult to work with a system offering inferior performance.
Yet, an ISS has one fundamental shortcoming: it will only simulate code running on a processor. An ISS is unable to take the rest of the circuit into account, particularly it is incapable of simulating the interactions between embedded software and peripheral logic.
Three approaches address this problem, each one with its own set of drawbacks.
- HW/SW Co-Simulation with an RTL Simulator
This approach consists of a software tool called hardware-software co-simulation that enables co-simulation of an ISS with an RTL simulator. With this method, performance falls to a few instructions per second (five orders of magnitude lower than with an ISS alone), which makes the solution useless to software developers.
- C Model
This solution consists of interfacing behavioral models of the peripheral logic, typically written in C language, to the ISS via a programming interface (API). This solution may prove practical during the architectural exploration phase, but it cannot be used for verification and debugging of the interactions between the RTL components.
- Rapid Prototyping Board Based on FPGA
Until now the most widespread solution to meet software developers' requirements encompasses a rapid prototyping board based on reprogrammable components (FPGA). The entire circuit is mapped into one or more FPGAs, and the software development environment is connected to the board via a standard JTAG interface. The advantages for software development are twofold. First, the code's execution rate can reach 50 MHz, which is an order of magnitude higher than with an ISS, while retaining a familiar environment. Real time is often possible. Secondly, the communication between the software and the hardware is modeled with very high accuracy, cycle by cycle, without compromises. This would be a perfect solution were it not for the lack of internal visibility that precludes the debugging of the design hardware, and delays the adoption of the prototyping board until after comprehensive hardware testing. Further, time and cost to develop an FPGA-based prototype inhibit the cutback of the project schedule. In practice, this solution forces software engineers to wait for the hardware to be fully debugged before starting to use the platform. By the time the first lines of firmware code are run, the chip is back from the foundry and all chances of correcting HW/SW issues in hardware have been lost.
There are other minor drawbacks that restrict the rapid prototyping boards' productivity. For example, loading a program using the JTAG connector, which is nothing more than a 1-bit serial connection, and initializing the contents of a large memory can easily take 10 minutes. Encouraging developers to take coffee breaks is not necessarily the best way to increase productivity.
New Approach to Co-Verification
A soft emulation prototype, such as EVE's ZeBu, is a combination of a prototyping board with a hardware emulator and a software debugger. The connection to a software debugger meets software developer's requirements. Based on FPGAs but with comprehensive visibility of the internal logic, soft emulation prototype also meets to a large extent the hardware engineer's needs.
Soft emulation prototype supports transaction-based verification (Figure 1). Hardware transactors enable rapid communication between a circuit mapped inside soft emulation prototype and a test bench executed in the PC by moving the processing-intensive and time-consuming bit-level communication between test bench and design to within the emulator.
Figure 1: A protocol such as PCI-X or USB 2.0 is more effectively represented by an exchange of messages rather than waveforms. A HW transactor uses that principle to establish en efficient communication between the emulated design and a complex SW test environment. By raising the level of abstraction, a HW transactor can also be re-used for different projects that share the same physical interfaces. |
And unlike the old hardware emulators, which due to their extremely high cost had to be shared between the members of a development team, the moderate price of soft emulation prototype facilitates its proliferation to each member, from software developers to hardware engineers.
Soft emulation prototype provides benefits to the entire development process. The hardware design team maps the circuit in the FPGAs and uses it for module testing of peripherals. At the same time, the developers run fragments of critical code and develop the peripheral drivers. As both teams use the same model, every bug fixed in the circuit benefits the developers. Similarly, the peripheral drivers can be supplied to the hardware engineers as soon as they are written, enabling more comprehensive integration tests.
Ultimately, software developers and hardware designers work in cooperation, each using the interface to which they are accustomed (Figure 2). The software developer will use the "xt-gdb" C code debugger (standard GNU tool modified for the Xtensa processor) without any knowledge of soft emulation prototype while the latter will emulate the design. The hardware design will use the soft emulation prototype's graphical interface to control the logic part of the simulation. Waveforms and monitors of each circuit bus and signal are constantly accessible, and the contents of the internal memories, even those deeply buried within the FPGAs, can be observed and modified at any time during execution. The circuit clocks are controlled entirely from the graphical interface, regardless of whether the simulation is being run cycle by cycle to observe the subtle changes in the circuit's behavior, or set to run continuously for several million cycles at a time.
Figure 2: On the same PC screen, the SW developer runs his favorite SW debugger whereas the HW designer monitors the behavior of the design at cycle level. Both can verify that the SoC design generates the expected video frames. |
The sharing of the simulation control between the software and the hardware posed a challenge. While the developer may need to stop the simulation by inserting a breakpoint in the C code, the hardware engineer may want to stop the circuit when it reaches a specific state. The native support of transactors by soft emulation prototype provided an elegant solution to that problem: instead of relying on a physical JTAG connection with an independent clock that cannot be controlled, commands from the software debugger were translated into JTAG transactions that ultimately remain under the control of the hardware emulator and can be stopped and restarted on demand.
Verification of the MPEG-4 Decoder
Let's return to our application, MPEG-4 decoding. The resulting design met the requirements: a reusable component with approximately 120,000 gates, including a processor to run a MPEG-4 decoding program containing 200,000 lines of C code, and using less than 10% of the processor resources. Running at a frequency of 200 MHz, the low utilization of processing resources made real-time decoding of a video in QCIF format easy, at a rate of 15fps.
The verification of the circuit presented some interesting challenges: displaying a complete trailer for a motion picture requires about a billion clock cycles. Until now, there were no solutions that met both software developers' and hardware designers' requirements satisfactorily.
The actual verification started by synthesizing and mapping the MPEG-4 decoder into ZeBu, EVE's soft emulation prototype. First the peripherals and the memory controller were validated at the hardware level, using simple software code. At the same time, a mechanism has been developed to transfer the image between the decoder's frame buffer and the PC via the ZeBu access functions, necessary to further validate the video decoding at the highest level of abstraction.
The ZeBu HW transactor technology was used to enable the capturing of each frame of the image from the decoder's frame buffer into a PC window in real time (Figure 3). HW Transactors provide a fast communication mechanism between the design emulated into ZeBu and the SW running on the PC, without slowing down the emulation run. Note that it took only half a day to create the video HW transactor. Next, the full SW code for the decoder was put into the processor and connected to the software debugger via a JTAG cable interface.
Figure 3: The design is synthesized onto ZeBu. During the emulation run, the video HW transactor extracts the video stream from the decoder's frame buffer via the fast PCI interface, whereas a physical cable connects the JTAG port of the embedded processor to GDB, the SW debugger. |
The Xtensa processor in ZeBu runs at 12.5 MHz, generating a video picture of approximately 13fps. Such performance produces quasi-real motion without any staggering effects, corresponding to an acceleration factor of around 10X compared to an ISS.
A verification platform like ZeBu allows for optimizing the joint use of hardware and software by modifying the way in which certain tasks are carried out. For a video decoder, for example, the test program in C traditionally includes the MPEG-4 data flow, in other words, several megabytes of data. The loading of this program via the JTAG interface can take 10 minutes. By transferring the data flow loading process to the hardware, a considerable amount of time is saved: ZeBu has direct access to the circuit memories via the PCI bus. In just a few seconds, the MPEG-4 video is downloaded via the hardware interface, while the C program continues to load via the software debugger and the JTAG interface.
In Search of the Critical Bug
Finally, while soft emulation prototype appears to be the perfect solution for hardware-software co-verification, a question may be raised about its ability to detect, isolate, and fix bugs effectively and efficiently.
A significant example of the tool's ability to debug hardware was the swiftness in finding a design error in a peripheral memory controller. The bug was causing the mixing of two pieces of data during certain consecutive reading and writing sequences. The symptom of the problem, which remained undetected during module testing, was that after two million cycles the decoded image became streaky, rather like an undecoded satellite TV image.
By using a combination of three ZeBu functions to analyze the problem we were able to rapidly detect and fix the bug. The three functions included: breakpoints in the C code to ascertain what the code does (software aspect), capture of the memory content to give an instant display of the incorrect words, and display of internal signals in the memory controller (hardware aspect).
Without the possibility of combining hardware and software development simultaneously, this bug would not have been fixed in time and would have severely affected the project schedule.
About the Author
Alain Raynaud is the Technology Center Director at Emulation and Verification Engineering (EVE), San Jose, CA.