Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec


















Audio Designline



eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • RFID-World
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


24 July 2008



Overcoming Static Timing Issues in Low-Power 3G Chips

Designers are using every design trick to reduce power in a 3G chip design. But, while cutting power, these tricks can create static timing analysis problems. Here's a look at how designers can close the timing gap in a 3G SoC.

By Stewart Shankel and Kaijian Shi, Synopsys Professional Services; and, James SW. Song, Ashwin Rao, Satyendra R.P.Raju Datla, and Yuanqiao Zheng, Texas Instruments Inc.
CommsDesign
Aug 19, 2003
Print This Story Send As Email Reprints
 
Rate this article
WORSE | BETTER
1 2 3 4 5
The power issues encountered by today's mobile phone manufacturers is well documented. With cameras, web browsers, and other processing-intensive tasks entering next-generation phones, designers try to squeeze every microamp of current out of their chip architectures.

But, the use of power reducing design techniques could create unusual timing configurations which complicate static timing analysis (STA), a vital step in any chip design.

In this article, we'll look at the STA issues faced by today's low-power 3G chip designers. Specifically, this article will look at STA issues involving the use of digital phase locked loops, a double data rate (DDR) SDRAM memory interface for mobile applications, multimode clock structures/clock division and extensive clock gating. Detailed solutions to these issues are presented based on an off-the-shelf timing analysis tool.

Solving STA issues on Digital PLLs
To illustrate timing analysis, we'll examine a system-on-a-chip (SoC) that handles overhead and management tasks, as well as signal processing tasks. This chip integrates many different interfaces to the outside world, ranging from memory controllers to common data communication interfaces (Figure 1 below).

The traffic controller shown in Figure 1 implements various arbitration algorithms to control and utilize available bandwidth efficiently. It also allows dynamically configurable data throughput and scalable operations between the system interface, the DSP core and the MPU core.


Figure 1: 3G SoC implementing digital PLLs, a DDR SDRAM interface, multiple clocks, and extensive clock gating.

The chip shown in Figure 1 incorporates several strategies to minimize power consumption, and these strategies sometimes make STA difficult. For example, the designers used digital PLLs rather than analog PLLs to reduce power consumption. The digital PLLs serve as frequency multipliers that can be adjusted by software based on computational need. The use of these multipliers also allows the design to use a low external clock-source frequency, so little power is lost in the external circuits.

Unfortunately for STA purposes, the phase error between the input (reference) clock and output (multiplied) clock from the digital PLLs can be high, due to the insertion delay incurred by the multiplier/divider combinations. This phase error is highly unpredictable. In fact, there may not be any phase relationship between the input and output clocks if odd multiples are selected for the multiplication factor (7.1, for instance). As a result, the input clock cannot be used as a timing reference for digital PLL-clocked circuitry when using digital PLLs as described above.

The solution is to declare a clock on the output of the digital PLL using the create_clock command, according to the following guidelines:

  • Set the clock frequency to the DPLL's highest output frequency to check setup times fully in the circuits clocked by the DPLL.
  • Account for setup margin by setting clock uncertainty to the sum of the largest jitter and the added timing margin desired, using the set_clock_uncertainty command.
  • Account for hold margin using only the desired added timing margin, without including DPLL jitter. (Note that you would need to include jitter if you have multicycle 0 paths or unbalanced half-cycle paths.)

To understand why jitter would not normally be included in the hold margin setting, consider three possible cases. In the first case, the rising edge of the clock at the source flip-flop is checked against the same rising edge of the clock at the destination flip-flop. No jitter is involved in the timing check because the clock edge at both flip-flops derives from the same edge issued from the digital PLL.

In the second case, the rising edge of the clock at the source flip-flop is checked against the falling edge of the clock at the destination flip-flop. Jitter does figure into this case, but it can be safely ignored because the hold check is from the rising edge of the source clock to the previous falling edge of the destination clock.

The third case parallels the second; the falling edge of the clock at the source flip-flop to the rising edge of the clock at the destination flip-flop is being checked.

Timing at the I/Os
Assume the clock object is set up as described above and named DPLL_CLOCK. Now the issues relating to specifying timing correctly at the chip's I/Os must be handled. Specifically, each I/O interface (for SDRAM, flash memory, etc.) now requires a clock signal output that is synchronous with the DPLL_CLOCK, since the input clock that drives the DPLLs is useless as a reference to external devices.

Each interface usually needs one or more unique clock signals to meet unique division/gating requirements. Additionally, consider that usually a fairly large insertion delay exists from the DPLL to the end leaves of the clock tree in a low-power design.

As a result, the DPLL_CLOCK is a poor timing reference for I/O timing assertions. The DPLL_CLOCK clock object is visible to STA but is not visible during silicon debug and is cumbersome as a timing reference during gate-level simulation with an event-driven simulator. The DPLL_CLOCK could be routed out of the chip to use as a reference, but assumptions would still have to be made about the DPLL_CLOCK delay.

One solution is the following scheme: For each I/O clock, declare a generated clock object on the output port for the I/O clock. The create_generated_clock command should reference, as the source clock, the immediate predecessor clock declared in the clock hierarchy. (Recall that multiple cascaded levels of clocks/generated clocks can exist.) For each interface, reference the interface's I/O timing assertions to the newly declared I/O generated clock object. Set the budget for each I/O timing assertion assuming that the I/O generated clock object has an insertion delay balanced to the basic insertion delay of clocks internal to the part. This assumption allows the budgets to be set early in the design and be used with ideal, zero-latency clocks.

Note that the assumption about the I/O clocks' insertion delay is not actually true. One reason is that the I/O clocks must be balanced during the design's clock-balancing phase to have smaller insertion delays than the chip's internal clocks. This step provides external designs with an early clock that can be balanced externally to internal clocks.

Another reason why the delay assumption is unrealistic is that a uniform target insertion delay for I/O clocks is usually unavailable; insertion requirements vary from clock to clock or are unknown. Thus, clocks are simply balanced to provide the shortest insertion delay possible, allowing circuits external to the part as much leeway in clock design as possible. After balancing, the I/O clock insertion delays are reviewed to see if any of the clocks leave too little margin for external balancing.

Since the assumption of balanced clocks is untrue, the budgets for the I/O in raw form cannot be used when running STA with propagated clocks. Therefore, one last step is needed to complete the I/O timing scheme. STA should be run on the I/O timing in two passes. The first pass updates timing and reports the insertion delay of the I/O clocks. This insertion delay is compared to the insertion delay of an internal reference clock and calculate the difference. The I/O timing assertions can then be adjusted by the calculated difference and used to check timing.

Here are samples of I/O timing assertions targeting port_a and the clock to which the port is referenced, clk_a. On the first pass:

set_input_delay --max --clock clk_a
port_a
set_output_delay --max --clock
clk_a port_a

On the second pass:

set_input_delay -max
-clock clk_a port_a
set_output_delay
-max -clock clk_a port_a

Dealing with DDR SDRAM
All DDR memories use both edges of the memory clock to send and receive data. Normally, a DDR interface uses a clock that has a positive edge for every new data item on the bus. Such a clock actually runs twice as fast as is absolutely necessary, however, and thus wastes power. Power can be saved by using a clock running half as fast to switch the write data mux directly (as shown on the left side of Figure 2). Note that this scheme introduces a clock-as-data path.


Figure 2: Conceptual diagram of a DDR SDRAM memory for mobile applications.

Designers must also bear in mind that exceptions for a DDR interface's receiving registers must be specified. Although these registers are clocked at a single rate, data comes to them at a double rate. It is therefore imperative that exceptions are formulated that isolate the correct sending/receiving clock-edge pairs.

In Figure 2, the DDR clock labeled Clock becomes data through the DDR multiplex path. The DDR write data goes to the DDR receiving flip-flops as well as a monitoring flip-flop clocked by Clock2. It is desirable to false-path all paths from flip-flops clocked by Clock to all flip-flops clocked by Clock2. This step can normally be easily accomplished by setting a false path from clock object Clock to clock object Clock2. Unfortunately, this approach also false-paths the clock-as-data path for Clock to the flip-flop clocked by Clock2.

To handle this problem, the false-path scripts can be modified. First, a collection of all flip-flops clocked by Clock are generated, then the false path from this collection to the clock object Clock2 is declared:

set related_regs [all_registers --clock Clock]
set pinname "/CLK"
foreach_in_collection reg_name $related_regs {
set cellname [get_object_name $regname]
set clkpin [concat $cellname$pinname ]
set_false_path --from $clkpin --to [get_clocks Clock2]
}

Timing paths in some off-the-shelf timing analysis tools are defined as originating on either an input port or flip-flop clock pin and terminating on either an output port or flip-flop data input pin. In the case of clock as data, the true source of the clock should be given, which is the pin on which the clock is declared.

Matching Data Send/Receive Paths
Because the DDR data crosses a chip boundary, the data must be timed twice, first at the sub-chip level and later as part of the interface logic models (ILMs) sub-chip inside a top-level netlist. On the stand-alone sub-chip, the data paths end on output ports rather than flip-flops.

The dual role of Clock in the DDR transactions (see Figure 2 above) causes difficulties for STA. The Clock rising edge clocks both the high- and low-phase data into source registers (not shown in Figure 2). Clock also switches the mux that selects between the high- and low-phase data. Destination registers (when timing with an external hierarchical model) clock the DDR data in half a cycle later.

The natural STA setup for this interface would reference only the Clock output as the time base for I/O timing constraints. However, this setup fails to specify all timing exceptions. Specifically, the I/O constraints would reference a single edge of the clock (a limitation of the PrimeTime tool). Due to clock skew, the clock control at the mux dictates the hold requirements for each edge of Clock, as opposed to clocking the source registers. The path through the select signal of the mux is thus the critical path for hold, but false paths cannot be set from the source registers to both edges of the Clock to remove hold violations that could not occur in practice. Further, each of the data inputs to the data mux derive from different edges of the source clock, making the definition of exceptions to a single external I/O clock even more difficult.

The solution is to modify the STA setup to declare two virtual clocks—180 degrees out of phase with one another—to replace Clock as the I/O timing reference. To define the latencies on the virtual clocks correctly, the rise and fall timings for the output clock need to be extracted dynamically and applied to the latency of the virtual clocks. A signal analysis tool may also allow all necessary combinations of false paths between the required clock edges and destinations to be specified with the virtual clock solution.

Here is an example of virtual clock creation:

create_clock --period $PERIOD --waveform {0 $HALF_PERIOD} --name vclk_pos
create_clock --period $PERIOD --waveform {$HALF_PERIOD $PERIOD} --name
vclk_neg
Latency definition example:
set_clock_latency $extracted_rise_lat --rise --source [get_clocks vclk_pos]
False path example :
set_false_path --hold --from $MSW_reg --to vclk_pos
set_false_path --hold --from $LSW_reg --to vclk_neg

Clock Tree/Modes
Controlling the use of clocks using highly configurable clock trees can significantly reduce power consumption. If not carefully controlled, however, such clock trees exhibit poorly controlled skew. If lots of buffers are inserted to manage the skew, gate/area problems can be created that prevent timing closure. Additionally, if the number of unique clock path configurations becomes large, it is difficult to determine what inter-clock uncertainties to place on the clock tree. The end result is a huge number of inter-clock skew assertions.

A typical configurable clock division circuit uses a finite state machine (FSM) that drives a divider mux. This circuit introduces a large number of parallel clock pathways that inevitably induce skew. Because the selects on the mux often link to modes of the chip, it is difficult to deal with this skew.

One solution is to use the custom cell shown in Figure 3 for clock division. The cell has only two clock division pathways: a /1 path and a path through a state element. The state element can be used to stage the output of an FSM that provides various clock division values (generated one /1 clock cycle early to mimic the typical muxed circuit).


Figure 3: Integrated clock division cell.

The custom cell should be laid out to achieve matched delay on the two paths from Clock-in to Clock-out for both rise and fall of Clock-in. In practice, the resulting skew is small enough to be included in the STA uncertainty margin. Thus, STA can be run without worrying about the uncertainty caused by multiple clock pathways, vastly reducing the STA effort and risk.

Clock inversion can also cause significant skew unless a custom exclusive-nor gate designed to match the delay is used, whether the gate is inverting or not. This strategy does not achieve perfect skew but minimizes the problem.

A typical clock-gating cell introduces mode-dependent clock paths, which in turn introduce mode-dependent latencies that increase STA and timing closure efforts. The integrated clock-gating cell shown in Figure 3 avoids problems by creating a single clock path.

If the design team does not have access to special clock gating cells or the resources to create a custom cell, clock-gating constructs can be built from discrete gates. If the discrete cell library is not characterized with clock gating values, this conservative rule can be used: Set clock gating setup and hold values for a cell's input pin to the propagation delay from that pin to the cell output.

Managing Power and Verification
Ultra-low-power strategies introduce many unusual timing configurations for SoCs. At the same time, the complexity of these SoCs usually mandates the use of static timing analysis to verify the chip in a reasonable amount of time. Using STA methods such as those described here allows designers to reduce power consumption to unprecedented levels and still run STA accurately and efficiently.

About the Author
Satyendra R.P.Raju Datla is a senior design engineer at Texas Instruments. He has a masters in Computer Engineering at Southern Methodist University and a bachelors degree in electronics & communication engineering from I.E.T.E, New Delhi, India. Satyendra can be reached at sdatla@ti.com.

James SW Song is the senior member of the technical staff, OMAP platform architect and design manger in Texas Instruments Wireless Terminal Business Unit. He has a BSEE and can be reached at s-song@ti.com.

Yuanquao Zhen is a senior electrical engineer at Texas Instruments. He has an MS degree in electrical engineering and can be reached at y-zheng2@ti.com.

Ashwin Rao is a design engineer in Texas Instruments Wireless Terminal Business Unit. He has an MS in electrical engineering and can be reached at ashwinr@dal.asp.ti.com.

Kaijian Shi is a principal consultant in Synopsys' Professional Services unit. He has B.Sc(Physics), M.Sc.(CS), M.Phil.(EE), and Ph.D.(EE) degrees and can be reached at kaijian.shi@synopsys.com .

Stewart Shankel is a senior staff consultant at Synopsys. He has an MSEE from Purdue University and a BSE from Walla Walla College. Stewart can be reached at shankel@synopsys.com.




EE Times TechCareers
Search Jobs

Enter Keyword(s):


Function:


State:
  

Post Your Resume
-----------------
Employers Area
Most Recent Posts More career-related news, resources and job postings for technology professionals



Home  |  Register  |  About  |  Feedback  |  Contact