San Mateo, Calif. Superscalar CPU architectures, in which a single instruction stream feeds multiple parallel execution units, began life in the rarefied world of workstations and servers, then gradually migrated downstream to PCs and very high-performance embedded applications. But the high transistor counts, complex control circuitry and general power-hungriness of superscalar machines have been antithetical to the needs of the cell phone handset market. Until now.
ARM Ltd. (Cambridge, England) and Renesas Technology Corp. (Tokyo) are pursuing superscalar architectures, driven by both the increasing computing loads and the energy-efficiency demands of the mobile wireless-terminal market. The companies appear to have reached similar conclusions from quite different initial positions. In the case of Renesas, the primary question was power.
"Power consumption is vital and CPUs have to be able to handle many [operating systems], which takes more CPU power. But because of power consumption
you need more instructions per cycle, and that gives you performance with lower frequencies," said Shinichi Yoshioka, group manager of the system-on-chip design group at Renesas.
The Renesas design team has added a second execution pipe to a new SH-family core, still known as the SH-X. The company has determined that the efficiency to be gained from lower operating frequency and hence, lower operating voltage will more than compensate for the added circuit complexity.
ARM, on the other hand, has come to the superscalar table primarily in search of computing power, but with a careful eye on energy efficiency. ARM's next-generation machine the company's first superscalar design will break through the gigahertz clock frequency threshold.
"The core will come out with lead partners during 2005 and be more broadly available at the end of 2005," said Mike Inglis, executive vice president of marketing for ARM. Inglis confirmed that several of ARM's established licensees have been contributing to the next-generation core's specification.
The core is being targeted at the 65-nanometer manufacturing process node, although it will probably come out in a 90-nm process first, Inglis said. He declined to name the partners working on the ARMXX company executives are still deciding whether it should be called the ARM12 but described them as being in the "first division" of ARM's licensees.
Work on the new core started about six months ago and much of the heavy lifting has been completed, Inglis said. "We've thought about the architecture, we've partitioned it. We're moving toward the microarchitecture. We need to go to the next level of performance in the ARM architecture for mobile applications," Inglis said, empha-
sizing that performance could not be obtained at the expense of power efficiency.
"The question is what are people going to want to do on a mobile platform in 2005. It's not going to follow the PC model," he said. "We've seen how camera phones have taken off in Europe. But in Japan you see much more media-rich machines."
Managing power
Beyond the move to superscalar, Renesas, ARM and other vendors are pulling every trick from their kits to simultaneously manage power and performance in their next-generation cores. At ARM, for example, the design team is using a combination of hardware and software to manage the power of its ARM1176JZ-S core, which is scheduled to be ready by mid-2004.
At the hardware level, the core uses clamps and level shifters to define "voltage domains" where the core can be shut down and the caches kept at their minimum retention levels. ARM said it is now working with EDA companies so that their tools will support these features.
At ARM licensee Motorola Inc., meanwhile, designers are using a mixture of hardware tricks for the job. One way is by leaning heavily on high-threshold-voltage transistors, which draw less leakage current than the faster low-Vt transistors. John Vaglica, manager of the advanced system-on-chip architecture team at Motorola, estimates that by reserving the leaky, fast transistors for critical speed paths, the team has held them to use in less than 10 percent of the gates of the next-generation ARM-compatible Jupiter core.
To keep from compromising performance, Motorola is using back-biasing and well-biasing techniques so that low-leakage transistors perform more like the low-Vt variety. It is gating the memory arrays so that their leakage current is near zero and finding ways to drop the voltage to 0.9 V without losing the data, Vaglica said.
Renesas, too, tweaked its memory subsystem by means of a predecoding that allows the memory blocks to be only partially activated. It also switched to a finer-grained clock gate control, and implemented two kinds of standby modes to minimize leakage current, Yoshioka said.
Chip architects are attacking the energy-speed trade-off at the macroarchitecture level as well. ARM's Inglis said the next-generation ARM core is part of a dual-pronged attack on embedded applications; the other part involves symmetric multiprocessing and is being jointly pursued with NEC Electronics . "Multiprocessing is very interesting to ARM and another way of solving the problem," he said.
See related chart