-max -clock clk_a port_a
Dealing with DDR SDRAM
All DDR memories use both edges of the memory clock to send and receive data. Normally, a DDR interface uses a clock that has a positive edge for every new data item on the bus. Such a clock actually runs twice as fast as is absolutely necessary, however, and thus wastes power. Power can be saved by using a clock running half as fast to switch the write data mux directly (as shown on the left side of Figure 2). Note that this scheme introduces a clock-as-data path.

Figure 2: Conceptual diagram of a DDR SDRAM memory for mobile applications.
Designers must also bear in mind that exceptions for a DDR interface's receiving registers must be specified. Although these registers are clocked at a single rate, data comes to them at a double rate. It is therefore imperative that exceptions are formulated that isolate the correct sending/receiving clock-edge pairs.
In Figure 2, the DDR clock labeled Clock becomes data through the DDR multiplex path. The DDR write data goes to the DDR receiving flip-flops as well as a monitoring flip-flop clocked by Clock2. It is desirable to false-path all paths from flip-flops clocked by Clock to all flip-flops clocked by Clock2. This step can normally be easily accomplished by setting a false path from clock object Clock to clock object Clock2. Unfortunately, this approach also false-paths the clock-as-data path for Clock to the flip-flop clocked by Clock2.
To handle this problem, the false-path scripts can be modified. First, a collection of all flip-flops clocked by Clock are generated, then the false path from this collection to the clock object Clock2 is declared:
set related_regs [all_registers --clock Clock]
set pinname "/CLK"
foreach_in_collection reg_name $related_regs {
set cellname [get_object_name $regname]
set clkpin [concat $cellname$pinname ]
set_false_path --from $clkpin --to [get_clocks Clock2]
}
Timing paths in some off-the-shelf timing analysis tools are defined as originating on either an input port or flip-flop clock pin and terminating on either an output port or flip-flop data input pin. In the case of clock as data, the true source of the clock should be given, which is the pin on which the clock is declared.
Matching Data Send/Receive Paths
Because the DDR data crosses a chip boundary, the data must be timed twice, first at the sub-chip level and later as part of the interface logic models (ILMs) sub-chip inside a top-level netlist. On the stand-alone sub-chip, the data paths end on output ports rather than flip-flops.
The dual role of Clock in the DDR transactions (see Figure 2 above) causes difficulties for STA. The Clock rising edge clocks both the high- and low-phase data into source registers (not shown in Figure 2). Clock also switches the mux that selects between the high- and low-phase data. Destination registers (when timing with an external hierarchical model) clock the DDR data in half a cycle later.
The natural STA setup for this interface would reference only the Clock output as the time base for I/O timing constraints. However, this setup fails to specify all timing exceptions. Specifically, the I/O constraints would reference a single edge of the clock (a limitation of the PrimeTime tool). Due to clock skew, the clock control at the mux dictates the hold requirements for each edge of Clock, as opposed to clocking the source registers. The path through the select signal of the mux is thus the critical path for hold, but false paths cannot be set from the source registers to both edges of the Clock to remove hold violations that could not occur in practice. Further, each of the data inputs to the data mux derive from different edges of the source clock, making the definition of exceptions to a single external I/O clock even more difficult.
The solution is to modify the STA setup to declare two virtual clocks180 degrees out of phase with one anotherto replace Clock as the I/O timing reference. To define the latencies on the virtual clocks correctly, the rise and fall timings for the output clock need to be extracted dynamically and applied to the latency of the virtual clocks. A signal analysis tool may also allow all necessary combinations of false paths between the required clock edges and destinations to be specified with the virtual clock solution.
Here is an example of virtual clock creation:
create_clock --period $PERIOD --waveform {0 $HALF_PERIOD} --name vclk_pos
create_clock --period $PERIOD --waveform {$HALF_PERIOD $PERIOD} --name
vclk_neg
Latency definition example:
set_clock_latency $extracted_rise_lat --rise --source [get_clocks vclk_pos]
False path example :
set_false_path --hold --from $MSW_reg --to vclk_pos
set_false_path --hold --from $LSW_reg --to vclk_neg
Clock Tree/Modes
Controlling the use of clocks using highly configurable clock trees can significantly reduce power consumption. If not carefully controlled, however, such clock trees exhibit poorly controlled skew. If lots of buffers are inserted to manage the skew, gate/area problems can be created that prevent timing closure. Additionally, if the number of unique clock path configurations becomes large, it is difficult to determine what inter-clock uncertainties to place on the clock tree. The end result is a huge number of inter-clock skew assertions.
A typical configurable clock division circuit uses a finite state machine (FSM) that drives a divider mux. This circuit introduces a large number of parallel clock pathways that inevitably induce skew. Because the selects on the mux often link to modes of the chip, it is difficult to deal with this skew.
One solution is to use the custom cell shown in Figure 3 for clock division. The cell has only two clock division pathways: a /1 path and a path through a state element. The state element can be used to stage the output of an FSM that provides various clock division values (generated one /1 clock cycle early to mimic the typical muxed circuit).

Figure 3: Integrated clock division cell.
The custom cell should be laid out to achieve matched delay on the two paths from Clock-in to Clock-out for both rise and fall of Clock-in. In practice, the resulting skew is small enough to be included in the STA uncertainty margin. Thus, STA can be run without worrying about the uncertainty caused by multiple clock pathways, vastly reducing the STA effort and risk.
Clock inversion can also cause significant skew unless a custom exclusive-nor gate designed to match the delay is used, whether the gate is inverting or not. This strategy does not achieve perfect skew but minimizes the problem.
A typical clock-gating cell introduces mode-dependent clock paths, which in turn introduce mode-dependent latencies that increase STA and timing closure efforts. The integrated clock-gating cell shown in Figure 3 avoids problems by creating a single clock path.
If the design team does not have access to special clock gating cells or the resources to create a custom cell, clock-gating constructs can be built from discrete gates. If the discrete cell library is not characterized with clock gating values, this conservative rule can be used: Set clock gating setup and hold values for a cell's input pin to the propagation delay from that pin to the cell output.
Managing Power and Verification
Ultra-low-power strategies introduce many unusual timing configurations for SoCs. At the same time, the complexity of these SoCs usually mandates the use of static timing analysis to verify the chip in a reasonable amount of time. Using STA methods such as those described here allows designers to reduce power consumption to unprecedented levels and still run STA accurately and efficiently.
About the Author
Satyendra R.P.Raju Datla is a senior design engineer at Texas Instruments. He has a masters in Computer Engineering at Southern Methodist University and a bachelors degree in electronics & communication engineering from I.E.T.E, New Delhi, India. Satyendra can be reached at sdatla@ti.com.
James SW Song is the senior member of the technical staff, OMAP platform architect and design manger in Texas Instruments Wireless Terminal Business Unit. He has a BSEE and can be reached at s-song@ti.com.
Yuanquao Zhen is a senior electrical engineer at Texas Instruments. He has an MS degree in electrical engineering and can be reached at y-zheng2@ti.com.
Ashwin Rao is a design engineer in Texas Instruments Wireless Terminal Business Unit. He has an MS in electrical engineering and can be reached at ashwinr@dal.asp.ti.com.
Kaijian Shi is a principal consultant in Synopsys' Professional Services unit. He has B.Sc(Physics), M.Sc.(CS), M.Phil.(EE), and Ph.D.(EE) degrees and can be reached at kaijian.shi@synopsys.com .
Stewart Shankel is a senior staff consultant at Synopsys. He has an MSEE from Purdue University and a BSE from Walla Walla College. Stewart can be reached at shankel@synopsys.com.