A look back at the development of dynamic random-access memory (DRAM) over the years shows that DRAMs have historically been developed primarily by taking the needs of the PC market into consideration. Because of this influence, designers of other applications have been forced to use the "PC DRAM" regardless of whether it was suitable for their application. Specifically, these PC DRAMs have created bottlenecks for developmers of networking equipment trying to deliver 2.5-Gbps data rates and beyond.
Fortunately, solutions are on the horizon. Several new DRAM architectures are now hitting the market that are optimized for the needs of networking equipment designs
Fast-cycle RAM (FCRAM), a technology co-developed by Toshiba Corporation and Fujitsu Limited, inis one solution quickly emerging for the networking design community. The primary benefits of FCRAM include the combination of DRAM densities with random cycle performance approaching SRAM speeds; its proprietary core technology that achieves fast random access cycle times; and an architecture that offers short random access and cycle times, as well as high bandwidth combined with a conventional DDR interface using a more cost-effective DRAM technology.
In Part 1 of this article, we will address the basic architecture and operation of FCRAM, as well as provide a performance comparison to other emerging DRAM solutions. In Part 2, which will run next week, we will explain how the FCRAM features can directly benefit networking applications, in particular 10-Gbps/OC-192 line-card implementations.
The Traditional Adaptation Approach
In the past, DRAM performance enhancements have been focused on architecture modifications, such as increasing the peak bandwidth of the device by adding high-speed logic to the I/O. For example, synchronous DRAMs (SDRAMs), double-data-rate (DDR) SDRAMs, and Rambus DRAMs (RDRAMs) all fundamentally use the same memory core (cell array) with different high-speed I/O logic implementations to achieve their respective peak bandwidth increases.
While these enhancements can achieve the desired system performance increases in certain applications, they may not realize the same goal in other applications. For example, increasing DRAM peak bandwidth may boost performance in a PC, where the main memory is primarily used to fill CPU cache lines. However, it may have no impact in a networking switch environment that is characterized by short, random data packets.
Attempts to reduce latency in these PC-focused DRAMs have been made by utilizing multi-bank schemes. In this scenario, memory banks not currently being accessed are in a pre-charged state, which reduces the cycle time if the next data word to be accessed is contained in one of the pre-charged banks.
The primary challenge associated with adding more banks of memory is the increase in the cost of the DRAM. In addition, in the case where the next data word is within a different row of the active (non-pre-charged) bank, the current access must be finished and the bank pre-charged before the new access can begin. None of the previously mentioned DRAM architectures can address this "same bank" latency. Also, latency is not only a function of the number of banks and random cycle/access times (tRC/tRAC), but it is also affected by the bus turnaround time, which is expanded upon later in this article.
Enter FCRAM
FCRAM was specifically designed to meet the requirements of communication designers. Specifically, this memory technology was developed to reduce random cycle latency (random access and cycle times) while increasing peak bandwidth. What this really means is that the effective bandwidth is superior to the alternatives in certain applications, especially in the short data packet, random environment characterized in networking. FCRAM achieves this by implementing several architectural enhancements including:
- Three-stage row pipelining
- Fast access core
- Simplified DDR feature set
- Fast bus turnaround times
As discussed above, many DRAMs offer increased performance by using I/O logic enhancements, which can also be referred to as column pipelining. In other words, the DRAM column address cycle time is reduced, achieving fast burst speed. By using a DDR-like feature set and interface, FCRAM also provides this fast burst capability.
DDR, as the name implies, inputs and outputs data on both edges of the clock, hence doubling the peak bandwidth compared with single-data rate SDRAM. For example, if the clock rate is 133 MHz, SDRAM's data rate and peak bandwidth are 133 MHz and 133 Mbps, respectively. Using the same 133 MHz clock, DDR provides a data rate of 266 MHz and peak bandwidth of 266 Mbps, utilizing fundamentally the same process technology and memory core design as SDRAM, with only minor modifications to the peripheral I/O circuitry. FCRAM uses much of the same circuitry modification as DDR, such that it can yield the same peak bandwidth for a given clock frequency.
Additionally, FCRAM implements a scheme called three-stage row pipelining, which provides a tremendous improvement in row address (random) cycle time. By combining row pipelining with a fast memory core, which is achieved primarily by segmenting the core into smaller sub-arrays that can be accessed very fast, the FCRAM can achieve fast random cycle/access times (Figure 1). When looking at this figure, designers should note the improvement of both tRC and tRAC for the FCRAM, as well as the fact that new row addresses and commands can be provided to the FCRAM before the current cycle is complete (row pipelining).

Diagram of the FCRAM 3-stage pipelining scheme.
The three stages of the FCRAM's row pipeline are the address decoder, the memory array, and the I/O buffer. In a typical DRAM, when a row address is provided, the DRAM must first decode the row address, find the location in the memory array and then read the data from memory array to the I/O buffer (or from the I/O buffer to the memory array in the case of a write cycle). Because these functions must happen in series, a conventional DRAM cannot start the next row address sequence until it completes the current one by completing all three stages.
By pipelining these three functions, FCRAM is able to begin a new row address access as soon as the current row address is latched in the decoder. The FCRAM may even start decoding a third row address once the first one has resulted in data moving from the memory array to the I/O (or in the opposite direction in the case of a write cycle). The result is a random cycle time of 20 to 30 ns for FCRAM compared with 60 to 70 ns for other types of DRAMs, such as DDR.
Functional Difference
In addition to difference in the pipeling, FCRAM products provide some key optimizations over traditional DRAMs for comm designs. These functional differences include:
- The /RAS, /CAS and /WE pins are replaced by a function pin (FN) and two additional address pins (A13 & A14). During the first command (traditional RAS activation), the state of FN determines a read or write cycle and the upper (row) address is latched using A0-A14. The second command (CAS activation) latches the lower (column) address. Note the asymmetrical number of row/column addresses (also called broadside addressing), which is one of the innovations that allows FCRAM to achieve faster random access and cycle times.
- FCRAM read/write commands always include auto pre-charge, which eliminates read/write commands without pre-charge and separate pre-charge commands, as well as the multiplexed auto pre-charge (A10/AP) pin.
- FCRAM uses a /PD pin instead of clock enable (CKE) for power down mode, eliminating the CKE-to-clock timing dependencies.
- FCRAM has variable write burst length (using A11-A14), which eliminates the byte masking command and the use of a data mask (DM) pin for every eight I/Os.
- FCRAM's write CAS latency (WL) is equal to read CAS latency (RL) minus one cycle. This provides for much improved read-to-write and write-to-read bus turnaround time, compared with other DRAM types that fix WL equal to one cycle.
- Other SDRAM/DDR functions, such as burst stop and page mode, have been eliminated to simplify FCRAM controller designs.
Figures 2 and 3 highlight the changes described above.
FCRAM is a simplified version of DDR, yet close enough in compatibility with DDR to allow a memory controller design to utilize either device. Assuming the performance of DDR is acceptable, this option allows use of a PC-focused DRAM solution, although the cost/performance benefits of FCRAM should make it the preferred solution in certain applications.
Comparison of FCRAM , DRAM
As previously mentioned, often times DRAM performance numbers are shown based on peak bandwidth, which is simply burst mode clock speed multiplied by the number of I/O pins, with no consideration given to random cycle latency or bus utilization. For demonstration purposes, a comparison of FCRAM to DDR is provided in Figure 4. This figure shows the clock frequency of each device and corresponding calculation of peak bandwidth.

To determine effective bandwidth, the designer must determine the bus efficiency, which is a measure of the number of clock cycles when the device is either inputting or outputting data (valid data bus cycles) vs. the total number of clock cycles for the particular microprocessor request. In Figure 4, the microprocessor request is an 8-word read burst followed by an 8-word write burst.
Bus efficiency is a function of the following:
- Initial latency from the CPU request to first valid data word (tRAC for the DRAM)
- The burst length
- The pre-charge "penalty" if the following data word to be accessed is in the same bank as the current access (tRC, which is the sum of the DRAM pre-charge time plus tRAC)
- The bus turnaround time
The dependency of bus efficiency on each of these parameters is as follows:
- The bus efficiency is greater for faster tRAC/tRC
- The bus efficiency is greater for longer bursts
- The bus efficiency is greater with faster bus turnaround time
The burst length is application dependent, however tRAC, tRC, and bus turnaround times are DRAM dependent, and FCRAM excels in this regard due to the aforementioned architectural and functional improvements over other DRAM types.
In Figure 4, the bus efficiency is calculated for two cases. The first case, called bank interleave, is when consecutive data read/write cycles are always performed from a pre-charged bank, i.e., a different bank than is currently being accessed. To elaborate, the maximum burst length is four cycles, so a burst of eight cycles is really a 4-b burst followed by another 4-b burst from a pre-charged (different) bank in this case. On the contrary, the second case is where the second 4-b burst is from within the same bank as the first.
As expected, the bus efficiency is considerably worse for same bank accesses. However, the degradation for DDR is a 37 percent reduction in bus efficiency vs. 9 percent for FCRAM. In other words, FCRAM really shows its capability to minimize the effects of same bank accesses in applications with a high degree of randomness.
Determining True Bus Efficiency
To determine true system memory bus efficiency, the designer must take into account system/CPU overhead and the randomness of the application. Randomness can be defined as the percentage of time that a same bank access occurs, which is dependent on the application.
As previously mentioned, adding more banks to the DRAM hides the effects of randomness, but it also increases the cost of the DRAM. Also, due to the law of diminishing returns, adding more banks will not significantly improve performance above a certain number. Based on input from system and DRAM designers on the cost/performance tradeoffs, the industry seems to have settled on 4-banks as the ideal number. In Figure 4, both the FCRAM and DDR devices used in the calculations have four banks.
One final note on figure 4 is that the clock frequencies used for FCRAM are higher than for DDR. This is primarily due to the DDR market currently using 100/133MHz as its "mainstream" products vs. faster speeds being requested by users of FCRAM (FCRAM does have the capability to achieve higher frequencies due its simplified command structure and faster tRAC/tRC, which result in better timing margin). Regardless of the frequencies used, FCRAM's improved bus efficiency and resulting bandwidth increases are clear.
On To Part 2
In summary, the DRAM market is evolving to offer more application-specific architectures, which should be good news for communication designers. In particular, the FCRAM is an architecture well suited for applications that require both low latency and high bandwidth, such as high-performance networking systems. The FCRAM shines in these types of applications, as it allows the system designer to easily support DDR and FCRAM with a common interface, yet realize the added benefits of the FCRAM in terms of a simplified feature set, lower random cycle latency and faster bus turnaround times. The result is significantly higher effective bandwidth with minimal cost increase.
This wraps up part 1 of our series. In part 2, we explore the impact that FCRAM has on a 10-Gbps networking design. To view part 2, click here.
About the Author
Kevin Kilbuck is the director of memory engineering for Toshiba America Electronic Components. He holds a BSEE from California State University, Chico and an MBA from Pepperdine University. Kevin can be reached at kevin.kilbuck@taec.toshiba.com.