FPGAS: Implementing the PCI Interface
Special Report on FPGA's
(in PDF format)
FPGAs play an important role in the design and manufacture of telecommunication, data networking, wireless
infrastructure, and communication test equipment, as well as for prototyping high-volume handset designs.
By Murray Disman
Field programmable gate arrays (FPGAs) and communication go together like peanut butter and jelly a perfect match. The match is so good that communications applications account for more than one-half of all FPGAs shipped. Rapidly evolving standards, time-to-market pressures, and the moderate quantities of equipment manufactured makes the FPGA an ideal fit to
industry requirements. FPGAs play an important role in the design and manufacture of telecommunication, data networking, wireless infrastructure, and communication test equipment, as well as for prototyping high-volume handset designs.
Almost all of todays communication equipment designs are microprocessor-based, and all of these contain buses, which, in most instances, determine the throughput of the system. The drive for higher-speed operation and higher data rates has propelled the industry in its
search for new bus architectures. Although the PCI bus standard was introduced by Intel as a desk-top solution, its high-performance features have made it very attractive to those developing embedded systems. In fact, PCI seems to be on track to become the first universal bus system.
The PCI interface
The PCI standard, as defined in Revision 2.1 developed by the PCI special interest group (SIG), is certainly one of the most complex bus standards ever issued.
Figure 1
shows a block diagram of a PCI master/slave interface.
Some maintain that PCI has not yet been fully tested. The four main versions of the standard are shown in
Table 1
.
Table 2
lists the key data rate determining performance factors for the bus. The device setup time specification has been the most difficult one for the FPGA producers to meet. The result was that many of the early core designs for FPGAs required
the introduction of a wait state, which effectively halved the data rate through the interface. This limitation was largely overcome as the device producers gained PCI design experience and as they migrated their fabrication processes from 0.5 µm to 0.35 µm.
The 50-MHz specification is really a defacto communication industry standard. This has evolved from the use of PCI as the microprocessor interface when using UTOPIA as the interface between the ATM physical layer and the ATM function layer,
as illustrated in
Figure 2
.
The PCI specification was developed to ensure that any add-in board meeting these requirements would work in a PCI desktop computer. However, most communication system designs are embedded or closed systems. The primary interest of those designing buses for these systems is speed, rather than strict compliance to every detail of the PCI specification. However, some designers of embedded PCI interfaces for communication equipment are
insisting that their designs be fully compliant. This is usually to facilitate PCI as a company-wide bus standard.
Implementing a PCI interface
There are three primary methods for implementing the circuitry for a PCI interface: a gate array or cell-based ASIC, standard PCI parts, and FPGAs.
The ASIC approach makes the most sense if your design is really fixed and the equipment is to be manufactured in large quantities. It also becomes the way to go if you are trying to achieve a very high
level of integration, or if there are certain speed parameters that cant be met with a standard IC or FPGA. The primary deterrents to this approach are the high non-recurring expenses (NRE) the one-time, up-front costs of developing a new ASIC and the length of time required to acquire the first parts.
Predesigned PCI cores are available from a variety of sources. The major ASIC suppliers, such as LSI Logic and Lucent Technologies, have their own core designs for customer use. In
addition, there are some fifteen independent intellectual property vendors that are offering PCI cores. The Virtual Chips operation of Phoenix Technologies and Sand Microelectronics has also been successful in supplying PCI cores for ASIC designs.
Standard ICs designed to implement PCI interfaces are available, with the most popular devices produced by AMCC, PLX Technology, and Tundra. Although these parts can offer the lowest-cost solution, they are inflexible. A substantial amount of glue logic must be
added to interface to the back end of the device. Many of the parts are still accompanied with a PCI errata list and often have a built-in interface to a specific processor. PLXs PCI 9080, for example, contains an interface to the i960 microprocessor; so, if you are using an i960, this may be the part for you.
FPGAs offer a real alternative to ASICs and standard ICs. They eliminate the NRE and long development cycles associated with ASICs. Compared to standard ICs, FPGAs allow the design of
interfaces that are specific to the system requirements and, in many instances, can offer a lower-cost and/or a higher-bandwidth solution. The glue logic that is needed to support a standard PCI IC is often implemented in an FPGA. Since a single FPGA can accommodate both the PCI function and the glue logic it makes sense to combine the two into a single device.
A long history of price reductions is another factor favoring FPGAs. These decreases result from the continual improvements in fabrication process
technologies and the move to smaller feature sizes, as well as device families that have been specifically designed for low cost. Actels MX, Alteras FLEX 6000, and Xilinxs Spartan and XC5200 series are all examples of families targeting the low end of the gate array market. The manufacturers of these parts, which have densities in the +20k-gate range, claim price parity with gate arrays.
Despite the inherently greater complexity of an FPGA, the chip sizes for these parts are comparable to
similar capacity gate arrays. Feature size reductions have resulted in 20k-gate capacity FPGAs that are pad limited the die size is determined by the number of bonding pads rather than by the logic contained on the chip. The gate capacity at which FPGAs become pad limited will continue to increase as process technologies migrate to 0.25 µm and below.
FPGA-based PCI solutions
These improvements in process technology, and the development of new FPGA architectures, have had a marked
impact on device speed. There are now a number of FPGA/ core solutions that meet the full burst 132 Mbytes/sec requirements at 33 MHz. However, these solutions can include performance subtleties that are not in complete compliance with the PCI specifications. Solutions for full-burst 66-MHz PCI interfaces are on the way.
All FPGA suppliers, with the exception of Motorola Semiconductor, are offering, or will soon offer, PCI designs (cores) that work with their devices. PCI has become a very important
application to the FPGA vendors because of customer demand and also because the level of performance attained is recognized as a good measure of device speed.
In every instance, it is necessary that the cores are specifically tailored by the company to match its devices architectures, in order to meet PCI performance requirements. The cores are provided as either netlists or, VHDL or Verilog code. The netlists and code both contain structure and/or timing constraints that instantiate macros or restrict the
placement of the different parts of the core.
Table 3
is a list of the PCI cores and devices offered by the FPGA suppliers. The list contains the latest offerings and also includes cores that were scheduled to be available before this publication date. Also included in the table are the prices, performance parameters, and restrictions for the cores. Only the master/slave cores are listed. All of the vendors offer a slave-only version at about one-half the price of the
combined master/slave core.
As shown in Table 3, some of the vendors include the DMA controller and first-in-first-out (FIFO) registers as part of the core, while others supply these as separate macros in their libraries. The device feature size is listed since it is an important determinant of speed. In addition, devices with feature sizes greater than 0.35 µm operate at 5V and those with feature sizes less than 0.35 µm require 3.3V or lower operation. For a more general overview on FPGA
devices on the market, see Table 4, FPGA Vendors, on our Web site at www.csdmag.com/arcsup.html.
Since most of the devices meeting the full-burst requirements are implemented in 0.35 µm processes it is important that they include the signal clamping diodes required by the PCI specification for 3.3V signal operation. Both 5V and 3.3V PCI signaling systems require clamp diodes to ground at the I/Os of the FPGA. An additional clamp diode to 3.3V is required by the 3.3V PCI specification. Most of the vendors
have made the insertion of the second clamp diode a programmable option. Xilinx, however, has released a new series, the XC4000XLT, with the clamp diode hardwired in place.
Actel
Actel introduced its first PCI core 18 months ago. At that time, the company found that every customer wanted something different, and it was necessary to redesign the core to make it more modular to support different back-end widths, synchronous and asynchronous SRAM, and data recovery circuitry for handling
interrupts.
Actels current core can utilize its own ACT3, MX, and the recently introduced SX series of parts. The 3.3V SX family is the fastest being produced by Actel, which is working on a 66-MHz core for these devices. Its first goal is a full-burst 66-MHz, 32-bit design. It will then try to extend this to a width of 64 bits.
The parts from these families do not have any on-chip memory, making it necessary to implement FIFOs with external devices. The two largest members of the MX include
embedded memory and will be able to accommodate on-chip FIFOs. A follow-on to the SX family will also include embedded memory.
Altera
Altera provides its PCI MegaCores as an encrypted netlist. Like others, it also provides a development board and a MegaCores test bench. The company claims that it has run some 5 billion test patterns through its PCI board to look for bugs needless to say, it found some.
Its pci_a MegaCores contains both the DMA controller and buffer registers. One wait
state is required at 33 MHz when the core is implemented in its 0.5 µm FLEX 100K devices. Zero wait state, 33 MHz performance can be achieved when the core is used with the faster 0.35 µm FLEX 100K 30/50A devices. In addition, these parts are fast enough so that the noncompliant pin-sharing technique used in the older FLEX 10K devices could be eliminated. Alteras low-cost FLEX 6000 family, which does not have on-chip memory, provides full-burst performance with MegaCores.
PCI cores are also
available for Altera devices from several of its intellectual property partners: Eureka Technology and PLD Applications (France), which has recently announced a core for 64-bit wide PCI targets.
Altera is planning on releasing its second PCI MegaCores, pci_b, in June 1998. The core will be more flexible than its predecessor, will allow the user to customize the DMA controller, and will include four or five different types of FIFOs.
Tests on the recently released FLEX 10K30A indicate than the part can
run at 66 MHz, but it will not meet the setup and clock-to-out times required for full-bust performance at this clock rate. The company now feels relatively hopeful that it will be meet the full 66 MHz requirements in its 0.25 µm FLEX 10KE series. These parts are due out during the second half of this year and include an improved memory block that will be able to implement dual-port FIFOs.
Figure 3
Atmel
Atmel is developing a PCI master and target
core for its recently introduced AT40K family. It expects to have the core in beta test by the end of May. Atmel will allow its customer to migrate the FPGA core to an ASIC with no additional charges. The core will include a fully integrated DMA controller with address counter, byte counter, control and status, and interrupt status registers. The AT40K contains embedded memory that will allow the designer to implement on-chip FIFOs.
The master core, with the DMA controller, will fit into a AT40K20 device.
Atmel is relatively confident that the design will run at 33 MHz with zero wait states. This would be quite an accomplishment with a 0.6 µm part. Other SRAM-based FPGA producers could not reach this level of performance until they migrated to 0.35 µm processes. Atmel, itself, plans to begin producing 0.35 µm versions of the AT40K family latter this year.
DynaChip
DynaChip is a newcomer to the FPGA business. It introduced its first product, a very high-speed BiCMOS FPGA,
about 1 year ago. The company is now launching a line of 0.35 µm CMOS devices, its DL 6000 family, and is positioning itself to serve the high-performance end of the FPGA applications range.
The company has set itself the difficult task of developing a PCI core that can run at 66 MHz, with no wait states, in its 0.35 µm devices. The DL 6000 series was developed to meet the 66 MHz PCI requirements, and preliminary analyses have shown that this is possible. This device family, which is now being
sampled, contains embedded memory and should be able to implement on-chip FIFOs.
DynaChip anticipates that it will be able to release its core by mid-year. The company, however, will have to use the on-chip phase-locked loop (PLL) to meet the timing requirements. In the past, this would have violated the PCI specifications. A recent variance to the specification allowing PLL use states that for clock frequencies between 33 MHz and 66 MHz, the clock frequency may not change except in conjunction
with a PCI reset.
GateField
GateField is also a relatively recent entrant to the FPGA business. It is producing a series of FPGAs that are unique to the industry, in that they are both nonvolatile and reprogrammable. The company developed its own PCI interface design under a license from Eureka Technology, an independent core provider. An interesting aspect of GateFields agreement with Eureka is that customers who purchase GateFields master or target PCI cores can receive
an upgrade to use the core in an ASIC for about $5,000.
These cores can run at 33 MHz in GateFields GF250 and GF260 families of devices. The GF260 contains embedded memory, which can be used to implement the FIFOs. FIFO and DMA controller designs are provided separately from the core. The current parts are noncompliant since they cant simultaneously meet the 7-ns setup time and the 11-ns clock-to-out times. The company expects to resolve this problem as it migrates to finer geometries.
GateField has licensed its architecture to Siemens, which is developing a combined FPGA/microprocessor using a 0.25 µm Flash memory process. GateField has the rights to use this process to produce its FPGAs. The finer geometries should substantially increase the speeds of the companys parts.
Lucent
Lucent Technologies has been developing and supplying PCI interface solutions for FPGAs it claims to have been providing a 33-MHz core for the past 2 years. The
companys core can now support both 33-MHz and 50-MHz full-burst performance when using its 3.3V ORCA 2T/3T parts. The current core design includes both the DMA controller and the FIFOs. The 16 x 32 FIFO can be modified by the user by changing the Verilog code.
Lucent became the first to introduce a 66-MHz FPGA-based PCI solution when it announced the OR3TP12 in May 1998. This device contains a 33-MHz/66-MHz, 32-bit/64-bit PCI interface that is embedded on one chip with an ORCA OR3T55 FPGA. The PCI function
displaces four of the eighteen rows in the FPGA.
The embedded core contains two independent controllers for the master and target. The device is capable of full-burst PCI transfers in either direction, on either the master or target interfaces. The dual 32-bit data paths into the FPGA permit bidirectional transfers of up to 264 Mbits/sec or unidirectional transfers of 528 Mbits/sec. The data paths can be configured to be either two 32-bit buses that are multiplexed between the master and target or four
independent 16-bit buses.
QuickLogic
QuickLogic is the only company offering a PCI solution as a no-charge reference design. Its current design requires one wait state. The company is readying a new design that can run at 33 MHz with no wait states in its pASIC3025-3 device. It is also possible that the zero wait state requirement will be met in one of its older devices, the -2 speed grade of the pASIC2 family.
A DMA controller, which can be removed, is included in the current core design.
External FIFOs are required as the current devices do not have on-chip memory. A version of the pASIC3 family is planned with embedded memory.
The company is working on a 66-MHz solution that is scheduled to be released during the third quarter of this year. It would not comment on the nature of the solution, which could be either a core or an embedded PCI function similar to the Lucent Technologies approach.
Vantis
Vantis, an AMD subsidiary, only recently announced its entry into the
FPGA part of the PLD market. The company has a long history as a successful supplier of programmable logic and is now shipping both simple and complex PLDs.
The new FPGA family, the VF1, is being produced using a 0.25-µm process, but contains an on-chip voltage converter for 3.3V operation. The VF1 family consists of four devices that cover the 12k to 36k density range and contain embedded memory. The company is confident that it will be able to deliver, in the second half of this year, a PCI core
that will run at 66 MHz with no wait states.
Xilinx
Xilinx is now offering its second generation LogiCORE PCI32 4K V2.0. This core is rated to run at 33 MHz with no wait states when using the companys 3.3V XC4000XLT family. The core/FPGA combination claims to be fully PCI compliant. The XLT family does not contain embedded memory, but on-chip memory can be implemented from the look-up tables (LUT) used for logic generation in the device.
Neither the DMA controller function nor FIFOs
are provided with a core. However, customizable DMA and FIFO designs are provided as macros. It is possible to configure dual-port FIFOs from the LUTs.
In May of this year, Xilinx announced a PCI solution that will run with its low-cost Spartan family of 5V and 3.3V devices. The core design is modified to fit this family, but the new design will be included as part of V2.0, at no added cost. Xilinx is providing a development board as part of its Spartan-based PCI solution. The board is being produced
by Virtual Computer Corp. and incorporates 256k of SRAM, which can hold the data for four different
FPGA logic configurations.
Xilinx plans to reach the full-burst 66-MHz, 64-bit goal with its new Vertex family. First shipments of these 2.5V parts, which contain both embedded and LUT-derived memory, should occur before the end of the first half of 1998. PCI cores for the Vertex family will follow later in the year.
The race to 66 MHz
High throughput is the reason behind the
popularity of the PCI interface. While the data transfer rate can be doubled by going from a 32-bit bus to a 64-bit bus, this is usually a cumbersome solution. Most designers would prefer to increase the clock rate to 66 MHz to double the data rate. Some, with extreme bandwidth requirements, will want both 66-MHz and the 64-bit bus.
It is important to understand that the circuitry following the PCI core can seriously degrade the overall transfer rate of the design. Overloading the signal lines with high-fanout
requirements in the user part of the design is but one way to lose performance.
The FPGA suppliers are engaged in a very competitive race to serve the PCI market and to win bragging rights about the prowess of their devices. It is interesting to note that the FPGA companies with PCI design experience expect that reaching the full-burst 66-MHz specification will be a difficult exercise, even in 0.25 µm devices. Those with less PCI design and customer experience see an easier road ahead.
Lucent has
decided to use an embedded PCI core to reach full-burst 66-MHz performance, since it considers this as the most economical and widely applicable solution. The company claims that it could have developed a PCI core that would run at 66 MHz in its OR3T parts. At present, QuickLogic is the only other FPGA who may opt to use the embedded approach.
Altera, Vantis, and Xilinx are waiting on their 0.25-micron families, with new and/or improved architectures, to attempt the full-burst 66-MHz requirement. The
industry will really be surprised if DynaChip is successful in delivering a fully compliant 66-MHz, zero-wait state design with its first attempt especially since it is trying this in a 3.3V, 0.35 µm FPGA. Atmel, too, may have a similar surprise in store for us.
Murray Disman is the editor and publisher of
Programmable Logic News & Views
, a monthly newsletter that has been following progress and activities in the FPGA/PLD and associated EDA tool industries since 1992. He has
been working as an analyst and market research consultant to the electronics industry for over 20 years, specializing in a number of different semiconductor areas. Disman received his MS and PhD in electrical engineering from Stanford University. He can be reached at mdisman@ix.netcom.com.