As high-speed interfaces begin to proliferate today's networking equipment designs, designers can no longer afford to use their static input buffering techniques. These static allocation schemes provide latency, overrun, and under run problems, making them an impractical option in high-speed networking designs.
To make high-speed interfaces hum, a more dynamic approach is needed for input buffering. That approach can be obtained through a proposed dynamic allocation scheme.
This is the second-installment in our look at a new dynamic allocation scheme for high-speed interface designs. In Part 1, we looked at traditional static-allocation methods and the problems they provide. Now, in Part 2, we'll further the discussion by proposing a buffer-allocation architecture that can change the input buffering landscape.
Basics on the New Method
As described in Part 1, static buffering techniques involve the allocation of fixed sized memory buffers to all the ports on the ingress interface. These fixed-size buffers have to account for worst-case storage requirement on each port due to overrun and under run. As the number of ports increases this approach becomes impractical due to huge memory requirements.
The proposed dynamic allocation scheme, on the other hand, uses a common memory pool with no static partitioning among ports. Buffer space is allocated dynamically from a common free buffer pool.
The proposed buffer architecture also uses single bit per port for credit-based status indication. The port occupancy for status bit transition is programmable for each port. Credit-based flow control is used to ensure adequate utilization of transfer bandwidth between status updates for a given port.
Status information for the active ports is framed as explained in Part 1 and is sent to the source on the status channel interface. The logic 0 level of the status bit indicates to the source the ability of the corresponding port in the sink FIFO to take transfer credit number of transfers. The logic 1 level indicates that the source can send transfers to the corresponding port only using any pending credits until a new status update is received. For any given port, the credit corresponds to the most recently received FIFO status. Thus, the credits are not cumulative and supercede previously granted credit for the given port. For every selection of a port for data transfer to the sink, the source should decrement the accumulated credit for that port.
As mentioned at the end of Part 1, the buffer space requirement for any port on the sink FIFO includes three components: data equivalent of transfer_credit, the under run threshold, and the overrun threshold. Figure 7 shows the organization of the buffer for any active port when under run and overrun thresholds are considered in addition to the port's allotted payload transfer credit. The figure shows the worst-case requirements, for a single active port occupying the entire interface bandwidth. The indicated thresholds are only for flow control and there are no fixed buffer allocations.

Figure 7: Buffer organizations for the dynamic allocation scheme.
Unlike static-allocation schemes, the overrun threshold is not statically allocated to all the ports. Any requirement for buffer space to absorb an overrun on a port is dynamically allocated from a common memory pool on a need basis. Through mathematical analysis, we would evaluate the amount of overrun that a port can suffer under different conditions. This would allow us to arrive at the upper bound for the common memory pool under worst-case condition and the lower bound for the memory pool under ideal or practical.
Overrun Calculations
Intuitively, it can be seen that a port would overrun if the source repeatedly sends data to the port because a status feedback is stale or has not been updated to reflect the latest writes to that port. Stale status information received by the source grants it spurious credit, which the sink port might not be able to accommodate.
The following argument assumes that the source is repeatedly selecting a port on the datapath and that the source gets continuous status update for this port on the status channel. In this scenario, the port would be overrun only if its selection interval at the source is less than the total feedback response latency. As the selection interval of a port increases with respect to the response latency, the amount of excess data sent by the source for the port decreases proportionally. Therefore, if an overrun occurs on a port, the amount of overrun is directly proportional to the total feedback response latency and inversely proportional to the selection interval of that port (see Figure 5 from Part 1).
In the latency time interval, L, a port with selection interval Si will be selected L/Si times.
If L is less than Si, then overrun cannot occur on port i.For L less than or equal to Si, overrun on port i is given by:
Where,
- Oi is amount of overrun on port i.
- L is the total feedback latency.
- Ls is the status channel latency.
- Ld is the datapath latency.
- Si is the selection interval for port i
.
In this equation, Oi is in the units of buffer count; L is in the units of buffer count; Si is in the units of number of clocks; and proportionality constant has the dimension of clocks.
Aggregate rate of all the ports is normalized to:
Where, N is the number of ports.
The proportionality constant is taken as one. Hence, aggregate buffer requirement to accommodate overrun on all ports is given by:

Using equation (3), we get:
Therefore, in an ideal situation, when all the ports are complying with their rates, the size of the memory pool required to handle overrun is equal to the total feedback response latency L.
For the analysis of non-ideal situation, we introduce two parameters specifying the rate violations on the write and read interface of the sink buffer. Parameter xi represents the factor by which write rate for port i violates its programmed rate. Similarly, yi represents the factor by which read rate for port i violates its programmed rate. Using these parameters, equation (4) changes to:
This equation is valid for ports on which write rate exceeds the read rate. For ports on which xi is less than or equal to yi, there is no possibility of overrun and hence the overrun component reduces to zero. Therefore, total memory required for overrun handling on all ports is given by:
As given in equation (3) the overall bus throughput is normalized to one. Irrespective of the factor by which a port violates its rate, the total bus throughput cannot be exceeded. Therefore for any port i the following condition will apply:
Substituting (8) in (7) we have:
Thus under worst-case condition all ports can require feedback latency threshold amount of space to absorb latency.
Dealing with Under Run
Under run occurs on a port if read out from that port exceeds its programmed rate. To prevent the port FIFO at sink from becoming completely empty and remaining so for a long time, an advanced indication of empty has to be sent to the source. This advanced threshold determines the transition point of the status signal.
When the FIFO becomes empty, the information reaches source after a delay equivalent of status channel latency. The source updates its credit and schedules a transfer to the port that received a status update indicating low occupancy (empty). This information comprises another latency component equivalent to the scheduler latency.
Data from the source is written into the sink FIFO after the datapath latency and FIFO status becomes non-empty at this point. The FIFO should have enough data in it to sustain readout during this entire feedback process. Thus, the threshold for preventing under run is equivalent to the total feedback response latency. This is indicated in Figure 7. This explanation is true if read is happening continuously from a single port. In a multi-port scenario, the amount of under run on a port is dependent on its read rate.
Transfer Credit Calculations
The status update interval for port i is directly proportional to the interval of selection of the port for data transfer, as shown in:
We define a ratio of the effective status update interval to the effective data transfer duration on data path as:
This ratio indicates the number of status updates on the status channel for each unit data transfer period on the datapath. Applying definition of Ai to equation (10) we have:
Selection interval for port i on the data path can be given by:
From equations (12) and (13), we can obtain the number of selections for port i on the datapath for each status update of that port on the status channel.
Thus, for each of its status update, port i gets selected Ci times for data transfer over the data path. Thus, Ci represents the transfer credit to be allocated to port i and it is independent of the port rate.
Total Buffer Requirement
From the above analysis, it is clear that the worst-case buffer space requirement for a single port is given by the equation:
Where, Boi is the overrun buffer space, Bui is the under run buffer space, Ci is the transfer credit buffers space. The extra buffer is the minimum occupancy of the FIFO for initiating reads. Using results from previous sections, equation (15) becomes:
Therefore, total buffer space requirement becomes:
The above result is the theoretical maximum required for the static allocation scheme.
For dynamic allocation, designers need to use the following equation:
Simulation results have shown that for the proposed implementation of the sink buffer, worst-case occupancy of the buffer never approaches the theoretical maximum of static allocation scheme. The proposed buffer architecture was modeled and was subjected to numerous tests under varying traffic conditions. Tests have shown that in addition to the analyzed parameters, occupancy pattern also depends on the selection order of ports on the read and write interface.
Results of some of the typical tests are shown below. The tests were performed for 64 active ports, with a total feedback latency of seven buffers, under run threshold of seven buffers on all ports and payload transfer credit of six buffers on all ports.
In test 1, all ports are subjected to maximum possible rate violation on write interface and read interface of all ports is compliant to their programmed rate. Figure 8 shows buffer occupancy pattern over 30,000 selection samples. Maximum occupancy is 635 buffers whereas the theoretical maximum from equation (18) is 1344 buffers.

Figure 8: Buffer occupancy pattern for test 1.
In test 2, 33% of total ports on write interface are compliant to their rates while other ports are subjected to maximum write rate violation. On the read interface, all ports are compliant to their rates. Figure 9 shows the buffer occupancy pattern for this test.

Figure 9: Buffer occupancy pattern for test 2
Tests have also shown that if the aggregate read rate from the buffers is greater than or equal to the aggregate write rate then the total buffer occupancy is equal to the latency in the start of read operation from the buffer. This is unlike static allocation scheme where each port will need buffer space equivalent to this latency.
In static-allocation scheme occupancy of a port is determined only by its write and read rate and is independent of the other ports. This implies that since the maximum occupancy of a port cannot exceed its programmed size, it can suffer overrun even though some other ports might be having empty buffer space.
In the dynamic-allocation scheme, since all the ports share a common memory pool, the buffer space available for a port is influenced by read rate of other ports also. Aggregate buffer occupancy is a function of the aggregate read and write rates. Elastic port size boundaries allow each port to be filled up to its theoretical maximum due to bursty writes to it, while its extra buffer requirement are served by buffers being returned through reads on other ports. All ports cannot suffer overrun simultaneously, since they cannot violate their write rates simultaneously. However in a situation where read out from all ports is completely stalled, each port might be pushed to its maximum occupancy and overall buffer requirement approaches theoretical maximum.
Another advantage offered by proposed dynamic allocation scheme is the ability to dynamically re-provision port rates without affecting active traffic on the interface. Ports can be added, removed, or resized on the fly without re-initializing the system.
Implementing the Dynamic-Allocation Architecture
There are numerous options for the implementation of dynamic-allocation architecture. One of the most popular approaches is through the use of memory based linked lists. This technique has been dealt with extensively in many literatures.3,4,6
A linked-list implementation would mainly comprise of the following components: a payload memory, a buffer linked list memory, and a linked list index memory. The payload or data memory is partitioned in granularity of minimum sized buffers. Each buffer will be addressed by a unique index (the address of that location). The size of the payload memory is dictated by the cumulative buffer requirement of all ports including the transfer credit, overrun, and under run buffers.
For each incoming buffer to the FIFO, a free buffer from the payload memory will be allocated and data will be written into it. After the data has been read out of the FIFO, the location reserved for that buffer will be de-allocated and returned to the free buffer pool. To keep track of all the occupied and reserved buffer locations, a memory based link list of buffer indexes is required. This link list will contain all the free buffers linked to each other, the head and tail of this linked list will be maintained in the free buffer head and tail registers. In addition, a linked list of all the reserved (occupied) buffers is maintained in the same memory.
Editor's Note: To view Part 1 of this article, click here.
References
- OIF Electrical interfaces white paper.
- Implementation Agreement OIF-SPI4-02.0
- Y. Tamir and G. L. Frasier, "Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches," in IEEE Transactions on Computer, 41(6): 725-737, June 1996.
- G. L Frasier and Y. Tamir, "The Design and implementation of a multi-queue buffer for VLSI communication switches," in Proc. Int.Conf. Computer. Deisgn, Cambridge, MA, Oct. 1989, pp.466-471.
- R. Sivaram, C. B. Stunkel, and D. K. Panda, "HIPIQS: A High-Performance Switch Architecture using Input Queuing," in Proceedings of the IPPS/SPDP '98, Oregon, FL, Mar. 1998, pp. 134-143.
- N. Ni, M. Pirvu, and L. Bhuyan, "Circular Buffered Switch Design with Wormhole Routing and Virtual Channels", in Computer Design: VLSI in Computers and Processors, 1998. ICCD '98. Proc.Int. Conf. Comput. Design, 1998, pp. 466 -473.
- Santosh Krishnan, Abhijit K. Choudhury, Fabio M. Chiussi, "Dynamic Partitioning for Shared-Memory Management," IEEE INFOCOM '99, April 99.
- G. Kornaros, C. Kozyrakis, P. Vatsolaki, M. Katevenis: "Pipelined Multi-Queue Management in a VLSI ATM Switch Chip with Credit-Based Flow-Control," Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI), Ann, MI, September 1997 (p.127-144).
About the Authors
Joji Philip is a senior design engineer at Paxonet Communications (India) Ltd. Joji can be reached at joji@pune.paxonet.com.
Sailesh Kumar is a senior design engineer at Paxonet Communications (India) Ltd. Sailesh can be reached at sailesh@pune.paxonet.com.
Sunil Shukla is a senior design engineer at Paxonet Communications (India) Ltd. Sunil can be reached at sunil@pune.paxonet.com.
Raja Venkatesh is a senior design engineer at Paxonet Communications (India) Ltd. Raja can be reached at raja@pune.paxonet.com.