Network processors change the way networking products are designed by moving what was previously a hardware-intensive solution into a more software-intensive architecture. In such an architecture, a core processor manages complex global tasks, while multiple low-level processors (microengines) perform the packet-processing operations.
While network processors offer enormous advantages in flexibility, they also present some programming challenges. One is distributing the control and data-handling responsibilities across a core processor and multiple microengines in a manner that provides high performance yet accommodates feature-rich applications. Another is creating the infrastructure that enables the processors to communicate with each other.
In addressing those challenges for its network processors, Intel developed a layered programming approach that breaks networking applications into modular building blocks called active computing elements (ACEs). That programming framework takes into account the unique complexities of a network processor based on a distributed-workload model.
There are two types of ACEs: the conventional one, which runs solely on the core processor, and the microACE, which uses the microengines to accelerate packet-processing functions. Typically, the microengines are used to speed any routine functions that must occur in the fast path, while the core processor is used in the slow path (any processing that occurs infrequently or is complex, such as exception packet handling).
See related chart
Conventional ACEs are composed of a classification portion and an action portion. The Network Classification Language was developed for packet classification with a syntax that makes packet description intuitive and compact. After classification occurs, the ACE can define the actions to be carried out. The Action Services Library eases the writing of the action portion of the ACE. The library includes application services, such as an application programming interface for manipulating TCP/IP packets.
The microACE takes advantage of the microengines to perform fast-path packet processing. A microACE contains two component code blocks. One code block, called the microblock, runs on a microengine. The other, called the core component, runs on the core processor.
Conventional ACEs and microACEs can be bound into a packet-processing pipeline that performs a series of operations on a packet, specifically, receiving it on a hardware interface, manipu-lating it in whatever way the application requires and, finally, transmitting it.
Some of the ACEs are provided as ready-made building blocks; some are written by the application developer. The application developer must bind the ACEs into a specific packet-processing pipeline. The bindings need to be flexible enough so that different kinds of modules (such as Layer 3 Internet Protocol forwarding, Layer 2 Ethernet bridging or network address translation) can be interfaced, but the bindings should not introduce significant overhead or performance penalties.
The ACE programming framework solves that problem by defining a packet destination target within an ACE. A target is bound to the next ACE in the pipeline. When an ACE sends a packet to a target, the packet is queued in an efficient ring-buffer structure and is dequeued by the bound ACE. If there are several possible destinations, the ACE can simply define several targets and bind them to different ACEs. If you reuse an ACE in another application, you do not need to change the code; you simply bind the target to a different ACE during initialization.
Higher-level help
Microengines are key to maintaining wire-speed packet processing. But the entire packet-processing task cannot be relegated solely to the microengines. At times, the higher-level StrongARM core processor is required to assist in complex operations, such as processing exception packets or packets that require a complete protocol stack. Those packets are forwarded to the core component, a block of code in the StrongARM core processor. The two components communicate through a layer of software called the resource manager, which runs on the core processor. The resource manager provides a well-defined API, making it easier to write code for each of the components.
One microblock encompasses a unit of packet-processing functionality. Multiple microblocks are chained together to form a larger packet-processing module called a microblock group. The microblocks are bound together by a dispatch loop, which controls the flow of packet processing from one microblock to the next. The dispatch loop and microblocks in a group are all compiled into a single microcode image that runs on one microengine. The same image can run on more than one microengine to take advantage of parallel processing. Packet metadata, such as the specific packet location in SDRAM, is cached on the microengine in a set of registers that is accessible to each microblock in the group. Those registers are used for communication between the different microblocks and the dispatch loop, leading to better performance.
Over time, more and more standard modules will be available. The modular approach reduces design complexity, enhances maintainability and makes the addition of features more straightforward. Modularity also results in a great deal of flexibility. Designers can make use of building blocks already provided by Intel or by third parties and can write their own modules to customize the design and create their own intellectual property.