Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec




















eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Green SupplyLine
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • RFID-World
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


15 March 2010



RLDRAMs vs. CAMs/SRAMs: Part 1

Part 1 of this two-part set details the RLDRAM architecture and shows how this architecture will displace CAMs and SRAM architectures in networking designs.

By Eugene Chang, Bill Lu, and Felix Markhovsky, Infineon Technologies
CommsDesign
Jun 03, 2003
Print This Story Send As Email Reprints
 
Rate this article
WORSE | BETTER
1 2 3 4 5
While interest in the networking sector often focuses on processor and switch fabric architectures, the memory architecture chosen by a designer can have an equally strong impact on the overall performance of a datapath design. Currently, most engineers use a combination of content-addressable memories (CAMs) and SRAMs in datapath architectures. These products, however, do not provide the density, size, and cost targets required in 10-Gbit networking architectures and beyond.

Fortunately a new option is on the way. A set of companies has banded together to develop a reduced latency version of a DRAM for data path designs. Called RLDRAM, this new architecture is designed to reduce cost and, in turn displace CAMs and SRAMs in networking designs.

In this two-part series, we'll look at the RLDRAM architecture and examine the impact this memory approach will have on a networking design. Part 1 will discuss the RLDRAM architecture and its suitability for the networking arena. In addition, a brief comparison will be made between RLDRAM and fast-cycle RAM (FCRAM). Part 2, continues with specific applications, as well as a peak into the next generation of RLDRAM—the RLDRAM-II.

Current Bottlenecks/Problems
As is well known today, memory access and densities have become the key bottleneck for the network processors' increased performance at OC-192 and OC-768 network speeds. Figure 1 shows a typical application in which such memory technologies as CAM and SRAM are used with a network processor-based line card. These line cards use CAM/SRAM in the classification look-up table section and SRAM as the packet buffer memory.


Figure 1: Typical memory application in a network processor-based line card.

A typical look-up table is used to perform the Layer 3 longest prefix match of an IP header packet, where an incoming IP packet will need to be directed to the appropriate address. This task is accomplished by the network processor, which locates correct coordinates through the look-up table.

In a CAM-based look-up system, addresses are searched and matched with the appropriate stored addresses. Once the match is found, the network processor can then guide the incoming packets to the appropriate end locations through the switch fabric.

With IPv4 packets, 32 bits of header need to be searched and matched. With IPv6 packets, 128 bits need to be processed, plus the added burden of providing the required classless inter-domain routing (CIDR), which previously only a ternary CAM (TCAM) perform.

The "unknown" or "don't care" state allowed in a TCAM is used to prevent address space waste due to class organization as required in CIDR address programming. Binary CAMs used for IPv4 packet processing are twice the size of SRAM implementations, and the TCAMs required for IPv.6 packets are two times larger than binary CAMs, and four times the size of SRAM implementations.

It has been estimated1 that in IPv4, 150 Mbits of CAM is required to properly handle a 1024-kbyte IP prefix number (address locations). The memory size increases dramatically to 2400 Mbits of TCAM for IPv6 with the same quantity of IP prefix numbers. As one can see, a very large amount of memory is required for a CAM look-up table. Here, the memory size is not the only issue; the ability to perform the search quickly is another one.

Packet Buffer Memory Needs
Packet buffer memory stores the actual packets until the network processor has come up with the correct location. Unlike the look-up table, in which only the IP header is worked on, the packet buffer memory stores the entire packet until it is ready to be directed to another location, whether out to the cable/fiber or through the switch fabric to another line card.

A key consideration for packet buffers is the ability to store or release vast amounts of information quickly. Traditionally, SRAMs were employed because of their short row cycle time (tRC) and quick bus turn-around, which allows storing and releasing data quickly. Although SRAMs are the fastest memory one can design-in for this type of buffer storage application, their main disadvantages are low memory density and, at the same time, very high power consumption.

SRAMs are typically designed with fast-switching CMOS logic transistors, which have high current leakage ratings. Compounding this high power consumption problem is the fact that an SRAM memory cell employs six high-leakage transistors for every stored bit of information. Employing six transistors for every stored bit makes CMOS SRAMs low-density memory devices.

Unlike CMOS SRAMs, a CMOS DRAM uses a single transistor cell for every stored bit of information, which makes it the most efficient device in terms of memory density and power consumption.

RLDRAM: Improving the Bottlenecks
As stated previously, traditional look-up tables for the networking application have been using more expensive CAMs as well as SRAM devices. In today's designs, the most each of these devices can store is approximately 18 Mbits for a CAM and 32 Mbits for an SRAM. The current generation of RLDRAM can store 256 Mbits. Just by moving to this DRAM architecture, network system engineers can store up to 128 times more data than a traditional CAM-based system design could yield. As one can see, precious board space savings can be achieved without having to resort to multiple boards to accommodate CAMs or SRAMs. Since network cabinet/closet space is at a premium, the network equipment design engineer can now increase the size of the look-up tables without resorting to using multiple memory boards.

Moreover, not only can space savings be realized when switching from a CAM/SRAM to a DRAM design, but significant power savings can also be achieved. To obtain the high number of memory locations that a single RLDRAM can store, several CAMs/SRAMs would need to be used. Each SRAM device has six transistors for each storage element, while a CAM has at least twice that many. With a single transistor for each storage element, the RLDRAM obviously dissipates considerably less power.

Replacing faster memory technologies, such as CAM or SRAM, with a DRAM-based technology such as RLDRAM does not sacrifice memory access bandwidth, because RLDRAM is capable of providing up to 600 Mbits per second per pin.

So Why is RLDRAM So Special?
RLDRAM is an advanced DRAM architecture that was specifically designed to have low latency because of its fast speed and row-cycle access times. While typical commodity DRAM architectures have four banks, the RLDRAM has eight banks. This allows for shorter column/row address and data bit lines, resulting in a faster access time.

In a four-bank device, driving and sense amplifiers have to deal with the large capacitance parasitics associated with long address and data lines. The RLDRAM's eight smaller banks have significantly shorter address and data lines, which reduces the parasitic capacitances and also contributes to a faster access. Furthermore, with more banks, there is less probability of random access conflicts.

In addition to shorter column/row address and data lines, there are other features to help speed up data. The optimum way to move data in and out of an RLDRAM is to store data in or read data from the eight banks in a "round-robin" fashion. In other words, a packet of data is stored in the first bank 16 or 32 bits at a time (depending on the RLDRAM organization chosen) and then moved to the second bank to make room for another 16- or 32-bit section of the packet. This continues until after storing the data in the eighth bank, when the next section of data is placed into the first bank, completing the first loop.

Likewise, once the data is stored in a round-robin fashion, it can be read in a similar way. Data is read out of the first bank. Upon completion of the first-bank read operation, data can then be read out of the second bank, and so forth. Finally, after finishing with the eighth bank, data can then loop back to the first bank to read data. Of course, the order of access can be random, and need not be from bank 1 to bank 8 (Figure 2)


Figure 2: (Top Figure) RLDRAM memory bank configuration, illustrating how data is written into or read out of the eight-bank architecture. (Bottom Figure) RLDRAM timing waveforms, assuming burst length of 2 and in the read mode. A READ latency of 5 clock cycles is shown from request of data to when data appears on the DQ line. A total of eight clock cycles is used to access all eight memory banks.

The round-robin operation is the main principle behind the reduced latency of RLDRAM. As the timing diagrams in Figure 2 illustrate, the reduced latency effect can be seen after the first data packet is read out of the first bank. Once the contents of the first bank are read out, the contents of the second bank come out on the next clock cycle, and so forth.

The initial latency to read the data contents of the first bank does take about five clock cycles to initialize the run, but once completed, the RLDRAM spills the contents as quickly as the clock cycle is applied. Write latency is shorter, down to two clock cycles. Data burst lengths for RLDRAM in write or read mode can be of 2 or 4 bits.

The round-robin approach of bank operations is not used with a four-bank DRAM device, such as standard DRAMs, since most four-bank devices are designed to perform more individual random access of data. In the case of RLDRAM and most networking applications, high data movement bandwidth takes precedence over random access of data. A double-data-rate (DDR) interface increases the speed of data to be pumped in and out of the RLDRAM. With a clock rate of 300 MHz, the DDR feature allows for data to move at 600 Mbits per second per pin.

RLDRAM also provide a short row-cycle time for random access. At 25 ns, the RLDRAM sports one of the fastest tRC times of any DRAM device.

Understanding the FCRAM Architecture
The FCRAM2 is another memory architecture looking to transplant CAMs and SRAMs in datapath memory designs. First introduced by Fujitsu Microelectronics over two years ago as a low-power SRAM replacement for portable handheld devices, the FCRAM has found its way into network systems. In particular, the second-generation FCRAM has been used as buffer memory in network line cards, where a tradeoff of SRAM performance versus higher DRAM memory density, lower DRAM power and lower DRAM cost has been leveraged.

The FCRAM departs from standard DDR DRAMs in several areas from architecture to the core design. The FCRAM architecture is fully pipelined and tightly coupled. It includes special features that are not found in standard DDR DRAMs, such as row access (RAS) and column access (CAS) control and an internal auto pre-charge cycle.

In addition, the core is highly segmented, which allows for the memory cell to be much closer to the interface of the FCRAM for quicker access. A typical DRAM device will have long row and column lines that encompass the entire memory bank. In the FCRAM segmented core, each segment within a memory bank will have a shorter access to the interface. This special segmented-core design, along with high-speed sense amplifiers, reduces the second-generation FCRAM row cycle times to 25 ns, which is similar to that of RLDRAM, from the 65 ns pf standard DDR DRAMs.

The FCRAM ability to activate only a small portion of the word line within a single bank of memory saves power. This is in contrast to the normal commodity DDR DRAM, where the entire word line needs to be activated for a bit to be written into one bank of memory. The FCRAM segmented core enables the partial word line feature by making a portion of the word line active to address a particular area of the segmented core. This allows only the area within a memory bank that is of interest to be activated, while the rest of the bank remains in pre-charge mode.

Comparing RLDRAM and FCRAM
There are several physical differences between RLDRAM and FCRAM. One of the major ones is that the RLDRAM is an eight-bank DRAM while the FCRAM has four banks.

In order to achieve a 25-ns row cycle, the FCRAM employs a highly segmented core with special high-speed sense amplifiers. On the other hand, the eight banks help the RLDRAM achieve the same row cycle time of 25 ns without applying special sense amplifier modifications. The RLDRAM sense amplifiers have not been altered from their high-yielding standard DRAM origins and thus should not be a factor in quality and reliability.

A significant difference between RLDRAM and FCRAM is that the RLDRAM data bus is inherently wider than the FCRAM. The largest RLDRAM data bus width of 32 bits is 2 times wider than that of the widest FCRAM data bus configuration. A wider data bus allows more data to pass during each cyclical operation.

Another significant difference is that the RLDRAM employs a non-multiplexed address bus that is similar to most SRAM architectures. The address bus is multiplexed in FCRAM, which not only adds the complication of dealing with multiplexed buses not found in SRAM replacement designs, but also degrades the maximum speed and performance at which the FCRAM can operate. In fact, the second-generation FCRAM reaches clock rates of only 200 MHz, while an RLDRAM manufactured in the same process technology can be clocked at a maximum of 300 MHz. As a result, the RLDRAM can move data at a rate of 600 Mbit/s per pin, while an FCRAM's theoretical maximum is 400 Mbit/s per pin.

In addition, the FCRAM's multiplexed bus does not allow 100-percent address command efficiency— a feature possible with the RLDRAM. The FCRAM data bus needs to share time with FCRAM addressing, lowering the efficiency rating to 50 percent. The RLDRAM address and data lines, on the other hand, have dedicated pins and no sharing is involved. Furthermore, due to its dedicated address and data lines, the command instruction set for RLDRAM becomes simpler than for FCRAM, since there is no need for a lower address latch command (Figure 3).


Figure 3: Overview of FCRAM and RLDRAM command sets.

Other basic differences between FCRAM and RLDRAM are summarized in Tables 1 and 2.

Table 1: Side-by-Side Comparison of RLDRAM and FCRAM

Table 2: Side-by-Side Comparison of RLDRAM and FCRAM: Take 2

As can be seen from the two tables, RLDRAM does have a lot of advantages over FCRAM for networking applications. In Part 2, the discussion will continue to show applications and further illustrate the uses of these new high-performance DRAMs in networking applications. An introduction of the second-generation RLDRAM (RLDRAM-II) will also be presented in Part 2.

Author's Note: More information on RLDRAM can be found at www.rldram.com or at www.infineon.com/memory/rldram/.

References

  1. C. Bernard Shung, Network Processing ICs, ISSCC 2001 Tutorial, San Francisco, February 4, 2001.
  2. Toshiba FCRAM datasheet, November 2001.
  3. Samsung Network DRAM-II Specification Rev 0.0
  4. Fast Chip Presentation, NPC EAST 2002.

About the Authors
Eugene L. Chang is a senior manager at Infineon Technologies, and is responsible for specialty memory devices, including Embedded DRAM, Flash, RLDRAM and others. Eugene received his Ph.D. and M.S.E.E. from Southern Methodist University, and a B.S. in Electrical Engineering from Columbia University, New York. He can be reached at eugene.chang@infineon.com.

Bill Lu is a senior manager in the Specialty Memory Group at Infineon Technologies. He is currently responsible for specialty memory technology roadmaps, product definition, and customer acquisition and support. Bill holds a Ph.D. in physics from Lehigh University, PA, and an M.S. degree from the University of Science and Technology of China (USTC). Bill can be reached at bill.lu@infineon.com.

Felix Markhovsky is a product manager with Infineon Technologies. Felix has an MSEE degree in control theory and data communications from Polytechnic University, Moscow, Russia. He can be reached at felix.markhovsky@infineon.com.




EE Times TechCareers
Search Jobs

Enter Keyword(s):


Function:


State:
  

Post Your Resume
-----------------
Employers Area
Most Recent Posts
Accenture seeking Project Management Team Lead in Charlotte, NC

Accenture seeking Software Engineer in Salt Lake City, UT

Boeing Company seeking Software Engineer in Herndon, VA

Switch and Data seeking Customer Solutions Engineer in Dallas, TX

Chart Industries seeking Sr. Developer in Cleveland, OH

More career-related news, resources and job postings for technology professionals



Home  |  Register  |  About  |  Feedback  |  Contact   |  Site Map
All materials on this site Copyright © 2010 EE Times Group, a Division of United Business Media LLC All rights reserved.
Privacy Statement ¦ Terms of Service