Around the world, networking equipment manufacturers and their customers are assessing the current state of the telecommunications industry and, together, they have all come to the same conclusion: basic telecom, cable, and Internet access are commodities and the future viability of the industry requires a shift to a revenue model structured on content-based billing.
Content-based billing will allow carriers, cable operators, and Internet service providers to bill for basic access and for the varying types of content and content-based services delivered over their networks. It will allow everyone to generate more revenue from existing bandwidth. Most importantly, it will pave the way for a move from revenue models built on legacy systems and bulk bandwidth to models built on differentiated content-based services for individual market segments.
To achieve these objectives and begin to bill customers based on the content they are receiving, service providers need a network that enables intelligent application management, facilitates the delivery of content-based services and drives content-based billing. That network must be able to properly authorize, authenticate and account for customized services by identifying individual packets and classifying them based on individual customer usage and quality-of-service (QoS) requirements. This can only be achieved in a network equipped with a classification processor that performs layer 7 content inspection and classification.
Enabling Intelligent Packet Management
Classification and content inspection processing enable intelligent packet management. They refer to the identification and classification of individual protocol data units (PDUs) in communication traffic moving through a network. And they are the most crucial functions in network processing architectures being designed in the next generation of networking equipment.
Programmed and executed properly, classification and content inspection enable the delivery of enhanced, next generation services like converged voice and data, streaming video, point-and-click calling, and secure virtual private networks, with differentiated QoS levels. But classification and content inspection processing is the most difficult network processing function to program, especially when the programming involves:
- Complex protocol encapsulations;
- Variable size headers; and/or
- Regular expression matching.
Fortunately, innovative programming models are now available that can minimize the time spent on programming complex classification. Based on the combination of traditional procedural methods with new, non-procedural models involving fourth generation languages (4GL), the new programming models can reduce time-to-market and increase time-in-market of networking equipment, while at the same time ensuring forward-compatibility of the classification solution.
The Role of Classification
Classification is the first function that must be performed when a PDU enters the system. Classification drives the rest of the network processing functions. It tells them what needs to be done with the PDU, depending on the class to which it belongs. Therefore, the granularity of classification that is performed and the complexity of classification criteria that the network processing solution can handle, will determine how intelligent and powerful the networking device will be.
There are three types of classification that can be performed by networking devices:
- Single field classification, which looks at a single field in the protocol header, is the most rudimentary and well-understood type of classification. It is typically performed in media access control (MAC) bridging and IP routing.
- Multi-field classification is more complicated because the lookup key is more complex (longer and composite), the set of values is richer, and the process of forming the key is more complicated.
- Content inspection classification refers to the ability to examine deeply into packet content in order to make classification decisions based on long character string values (i.e. hundreds of characters). For example, content inspection can be used to locate the URL and the cookie in an HTTP header. This is particularly useful for networking equipment deployed in front of large and very busy Web server farms, such as server load balancers and URL switches.
In order to perform content inspection a classification processor must parse five layers of protocol headers and then perform regular expression matching. This type of classification is best performed by a classification processor, which can look for patterns that involve hundreds of characters anywhere in the packet.
The Mechanics of Content Inspection
HTTP is an application-level, request/response type protocol for distributed hypermedia information systems. It facilitates an open-ended set of methods and headers that indicate the purpose of a request. The URL, on the other hand, is the basis of an information request and subsequent interchange of data or metadata. A token or "cookie" provides a mechanism for maintaining a stateful HTTP stream between a client and a Web switch or origin server.
How does the server keep track of a particular customer? The server tracks the customer through the use of the cookie. The URL addresses a particular Web page or other information on the network. The way to distinguish between customers that at any given moment in time use the URL page is to issue a cookie, which is planted on the client machine.
Each HTTP request from the customer may also contain this cookie. The load balancer may use the cookie to direct requests to an appropriate server based on URL and cookie values contained in the HTTP header.
URLs take the form of an arbitrary text string such as http://www.companyxyz.com/index.html.
Cookies usually take the form of name = value, where name is an arbitrary string and the value is a string representing a unique customer identification. Both the URL field and the cookie field can have an arbitrary offset from the beginning of the packet. Both fields have arbitrary length.
HTTP is an application layer encapsulated protocol. The HTTP protocol is based on a request/response paradigm. A client establishes a connection with a web switch or server and sends a request to the server in the form of a request method, URL, and protocol version followed by a MIME-like message containing request modifiers, client information, and possible body content.
The server responds with a status line including the message's protocol version and a success or error code followed by a MIME-like message containing server information, entity meta-information, and possible body content. The response may also contain a cookie.
Most HTTP communication is initiated by a user agent and consists of a request to be applied to a resource on some origin server or web switch. For example:
Code Listing 1
Click here to view Code Listing 1
A content inspection classification processor applies predefined and programmed filters and policies against the HTTP protocol stream received. It separates the HTTP protocol into fields and then applies a set of rules against the fields.
The following example illustrates how the HTTP input stream depicted in the previous section is broken into manageable fields to be operated upon by a classification processor:
Code Listing 2
Programming Classification Engines
There are new non-procedural languages such as XML or well known languages such as SQL that reduce the complexity of classification programming significantly while still being formal enough to be easily translatable or interpretable into machine instructions.
Languages can be used to describe data structures or patterns that need to be recognized in bit streams or PDUs, enabling the user to describe protocol headers as patterns. For example, a pattern is represented as a list of fields and each field is identified by its name and specific type in order to allow conditions to be imposed upon it. Thus, an Ethernet header can be described as:
Code Listing 3
Language provides the ability of describing an arbitrary string as part of the pattern:
Code Listing 4
Parametric patterns can be defined where values of certain fields are not known, ahead of time. This feature enables the specification of run-time parameters such as IP addresses, TCP/UDP ports, and values of URL and cookies. For example, the previous pattern could be defined as:
Code Listing 5
In this case, the following pattern would be equivalent to the original one:
HelloString
ParametricString ("Hello, World")
The following example shows how an HTTP request header can be coded:
Code Listing 6
This example describes the generic parametric pattern for an HTTP request. This pattern is used to match packets that contain specific values in place of parameters. One way to create a pattern with specific values is to substitute parameters at compile time:
Code Listing 7
In most cases, however, parameters will be substituted at run-time when the application is running. It is done by run-time API, once the application gathers parameters from the management plane.
Compilation
Patterns need to be compiled ahead of time in order to minimize the run-time processing. All patterns for a cookie-switching application are known while the application is being developed, thus the compilation never needs to happen at run-time. The compiler creates a binary object that we call a filter. In this case, since the pattern contains parameters for the URL, the cookie name and the cookie value, the result of compilation will be a set of "parametric filters".
In order for the application to control the classification processor at run-time, a run-time library of functions is required. This library is accessed through the API. The API uses the library of filters in order to build a classifier. There are three steps involved:
- Installation of a parametric filter from the library;
- Creation of a filter instance by substituting parameters to the parametric filter;
- Creation of a classifier by assigning the actions or tags to the filter instances, and combining together a number of such filter/action pairs.
Once the classification processor is loaded into the co-processor pattern memory, it can be executed.
Wrap Up
It is clear that a classification processor programmed to perform content inspection is the key element in any next generation network processing architecture. A processor programmed to perform content inspection enables intelligent application management, facilitates the delivery of content-based services and drives content-based billing. This processor makes it possible to build networks that will be able to properly authorize, authenticate, and account for customized services by identifying individual packets and classifying them based on individual customer usage and QoS requirements.
About the Author
Feliks Welfeld is the founder and CTO at Solidum. Feliks earned a Master's Degree in Electronic Engineering from Warsaw University of Technology. He can be reached at feliks@solidum.com.