Reed-Solomon Codec Design in Programmable Logic
With the increasing demand for the transmission of digital data over a wide
variety of mediums, the need for error-control coding will increase
significantly as well. Reed-Solomon coding provides a robust error control
method for many common types of data transfer mediums,
particularly those
that are one-way or noisy and sure to produce errors, such as digital video
broadcast.
By Martin Langhammer & Wayne Turner
With the drastically increasing need for the transmission of digital data,
error-control coding (ECC) has become necessary for maintaining data
integrity during transmission. This article presents an overview of ECC
theory, including a discussion of the various types of commonly used ECCs.
Reed-Solomon (RS) coding theory will be discussed in some
detail here,
including definitions of the various parameters involved and an analysis of
RS error detection and correction capability. The various types of
transmissions that use RS coding will also be covered, along with the
possible options available for implementation of these functions.
Finally, well analyze several typical RS codes implemented in a
programmable logic device (PLD). The conclusion will include a discussion
of the advantages of a programmable logic RS implementation over other
available methods.
Analog vs. digital data
The Information Age has resulted in the movement of vast
amounts of digital data over a variety of mediums. Whether the medium is
copper, optical fiber, or air, the importance of getting digital data to
its destination quickly is second only to getting the data to its
destination without error. For example, if noise is injected into the data
stream of an analog transmission, such as a radio or analog telephone
signal, a degradation in signal
quality will result. The degradation level
will depend upon the amount of noise induced. As long as the transmission
is understood on the receiving end, then a certain amount of error in the
transmission can be tolerated. In this previously mentioned example of
voice communications, a great deal of noise can be introduced before the
message becomes unintelligible on the receiving end.
However, errors
usually cant be tolerated in the transmission of digital data. As an
example, take the transmission of a
binary executable file. If any errors
are introduced in the data and not corrected, the received data might be
useless and need to be retransmitted. It is this increasing need for
error-free data that drives the technology of ECC. Put simply, ECC is the
process of adding redundant data to a message to be transmitted that can
then be used at the receiving end to detect and possibly correct errors in
the transmission. Since this redundant data adds overhead to the
transmission, the type of coding must be
chosen based upon how much error
the system is expected to see, and whether the capability to request
retransmission of data is available.
Types of ECC
There are two basic ECC classifications: automatic repeat request (ARQ) and
forward error correction (FEC). ARQ is a detection-only type of coding,
where errors in a transmission can be detected by the receiver but not
corrected. The receiver must ask for any data received and request that
detected errors be retransmitted. Since these
retransmissions will steal
valuable bandwidth, ARQ codes are generally used for clean
transmission mediums (those with a lower probability of error). One of the
most common examples is simple parity checking, which is often used to
detect data errors in RAM. Another example is a cyclic redundancy check
(CRC), which is used to detect errors in a transmission over Ethernet. If
errors are detected, the message will be retransmitted. Since Ethernet is
primarily transmitted over wire, the chance for errors is
less than for
some other mediums. Bandwidth overhead for data retransmitting can be
tolerated without a significant impact on overall system throughput.
Since a noisy medium stands a fair chance of introducing error into a given
transmission, the use of ARQ methods means constant retransmission of data,
reducing system throughput to unacceptable levels. In these cases, FEC, as
the name implies, allows not only detection of errors at the receiving end
but correction of errors as well. This reduces the need
for data
retransmission, which is then only required when the number of errors is
greater than the number that can be corrected by the FEC method. FEC is
also used for one-way communications, where the opportunity for the
receiver to tell the sender to repeat the message is not available.
Examples of these one-way paths include satellite transmissions and
magnetic tape storage mediums.
RS coding is a type of FEC that is being widely used because of its
relatively large error correction capability when
weighed against its
minimal added overhead. RS codes are also easily scaled up or down in error
correction capability to match the error rates expected in a given system.
The ability to implement RS codes in programmable logic is a very recent
development, but one that will be increasingly utilized as programmable
devices fall in price and increase in performance.
|
| Figure 1: RS
system level block diagram.
|
RS codes
RS coding theory first appeared in 1960 in a five-page paper in the
Journal of the Society for Industrial and Applied Mathematics.
The
paper, entitled Polynomial Codes Over Certain Finite Fields,
was written by MIT Lincoln Labs staff members Irving S. Reed and Gustave
Solomon. While the concepts were considered interesting, hardware of
sufficient speed wasnt available at the time to put these concepts to
practical use, nor was there a great need to move massive amounts of
digital data. That has obviously changed significantly in recent years,
with RS coding being widely used today in everything from CD players to
the broadcast of digital video.
RS codes are an example of a block coding technique, where the data stream
to be transmitted is broken up into blocks
and redundant data is then added
to each block. The size of these blocks and the amount of check data added
to each block is either specified for a particular application or can be
user-defined for a closed system. Within these blocks, the data is further
subdivided into a number of symbols, which are generally from 6 to 10 bits
in size. The redundant data then consists of additional symbols being added
to the end of the transmission. The system-level block diagram for a RS
codec implementation is shown in
Figure 1
.
The
original data, which is a block consisting of
N
-
R
symbols, is
run though a RS encoder and
R
check symbols are added to form a
codeword of length
N.
Since RS can be done on any message length and
can add any number of check symbols, a particular RS code will be expressed
as RS
(N,N-R)
code.
N,
as shown previously, is the total
number of symbols per codeword;
R
is the number of check symbols per
codeword and; therefore,
N-R
is the number of actual information
symbols per codeword. An example of a RS codeword is shown in
Figure 2
.
As an example, the digital video broadcast (DVB) specification uses a RS
(204,188) code, utilizing sixteen check symbols per 188 information symbols
for a total codeword length of 204 symbols. RS encoding then consists of
the generation of these check symbols from the original data. The process
is based upon finite field (also known as Galois field) arithmetic.
Finite
fields are so named because all arithmetic is closed over the field (the
result of any operation is still an element of the field). The field
elements are all values from 0 to 2
m
-1, where
m
is the number
of bits per symbol. Therefore, for a bit width of 3, the field will contain
the values (0,1,2,3,4,5,6,7), though not necessarily in that order.
A few
other definitions are necessary in order to understand which variables need
to be known to generate a particular RS code. The first is the
field
polynomial, which is used to determine the order of the elements in the
finite field. Valid field polynomials are a function of the bit width to be
operated on, and although the details of the field polynomial will not be
discussed here, the FIELD.EXE utility included in the core will give all
valid field polynomials for a given bit width. For example, the valid field
polynomials for
m
=3 are 11 and 13. Generally, the field polynomial
that must be used is determined by specification (the example
DVB
specification mentioned uses eight bit symbols {
m
=8} and a field
polynomial of 285 per specification). In dedicated RS encoders, this value
is in fact usually hard-wired into the chip, but some RS cores allow the
use of any valid field polynomial.
|
| Figure 2: RS
codeword.
|
The last item that needs to be known to generate a particular RS
implementation is the generator polynomial starting root. Again, this is a
value generally determined by specification (DVB uses a generator
polynomial starting at root zero), although any value can be used in the RS
core. This is another area where parameterized cores have an advantage over
a dedicated product, as this value is also often hard-wired into dedicated
RS solutions.
Another system-level characteristic of RS coding is whether
the implementation is systematic or nonsystematic. A systematic
implementation produces a codeword that contains the unaltered original
input data stream in the first
R
symbols of the codeword. In
contrast, in a nonsystematic implementation, the input data stream is
altered during the encoding pro- cess. Most specifications require
systematic coding, and the RS core implements systematic RS codes.
RS encoder
The simplified schematic representation for a systematic RS encoder is
shown in
Figure 3
. As shown, the input data stream
is immediately clocked back out of the function while being fed back into
the check symbol generation circuitry. The fact that the input data stream
is clocked out immediately without being altered means that the
implementation is systematic. A series of finite field adds and multiplies
results in each register containing one check symbol after the entire input
data
stream has been entered. At that point, the output select is switched
over to the check symbol registers, and the check symbols are shifted out
at the end of the original message.
From a device utilization standpoint,
the size of the encoder is most heavily affected by the number of check
symbols required for the target RS code. The total message length, as well
as the field polynomial and first root value, do not have any appreciable
effect on the device utilization or performance for a given target RS
code.
|
| Figure 3:
Systematic RS encoder schematic representation.
|
RS decoder
While RS encoding is a fairly straightforward process, the decoding process
is far more involved. As a benchmark, a
given RS decoder takes, on average,
ten times the PLD resources of the matching encoder. The mathematical
details of RS decoding are very complex, and therefore the internal details
will not be discussed here; however, a block diagram of a RS decoder is
shown in
Figure 4
.
A typical RS decode algorithm
consists of several major blocks. The first of these is the syndrome
calculation, where the incoming symbols are divided into the generator
polynomial, which is known from the
parameters of the decoder. The check
symbols, which form the remainder in the encoder section, will cause the
syndrome calculation to be zero in the case of no errors. If there are
errors, the resulting polynomial is passed to the Euclid algorithm, where
the factors of the remainder are found. The result is then evaluated for
each of the incoming symbols over many iterations, and any errors are
found and corrected. The corrected codeword is then output from the
decoder. If there are more errors in the codeword
than can be corrected by
the RS code used, then the received codeword is output with no changes and
a flag is set, stating that the error correction has failed for that
codeword.
|
| Figure 4: RS
decoder block diagram.
|
Error correction capability
As stated earlier, the goal of implementing a RS code in a given system is
to correct errors that are likely to be introduced in data during
transmission. The popularity of RS codes is due mainly to the high
correction capability they have as a function of the added data
transmission overhead. The error correction capability of a given RS code
is a function of the number of check bytes appended to a message. In
general, it may be assumed that correcting an
error requires one check
symbol to find the location of the error, and a second check symbol to
correct the error. In general then, a given RS code can correct
R
/2
symbol errors, where
R
is the number of check symbols in the given
RS code. Since RS codes are generally described as an RS
(
N,N-R
) value, the number of errors correctable by this code is
(
N
-
[N-R])
/2. This error control capability can be enhanced by
the use of erasures, a technique that helps determine the
location of an
error without using one of the check symbols. A RS implementation
supporting erasures would then be able to correct up to
R
errors.
Using the previous example of the DVB specification (which is a RS
[204,188] code), we then see that that given code can correct (204-188)/2
or eight errors per 204-symbol codeword.
One item of note is that since
RS codes work on symbols (most commonly equal to one 8-bit byte) as opposed
to individual data bits, the number of correctable errors refers to
symbol
errors. This means that a symbol with all of the bits corrupted is no
different than a symbol with only one of its bits corrupted, and the error
control capability refers to the number of corrupted symbols that can be
corrected. This is another reason why RS codes are so valuable in
transmission mediums where burst errors are common. In the DVB example
(which uses 8-bit symbols), being able to correct eight symbols means that
a burst error of 57 to 64 consecutive bits (dependent upon whether the
error starts on a symbol boundary) can be corrected by that RS code. If
these same errors are more evenly spaced within the codeword, however, it
will require many more check symbols to correct all of the errors. For this
reason, RS codes are generally combined with other coding methods such as
Viterbi, which is more suited to correcting evenly distributed
errors.
Another point to make is that the errors that are to be corrected
can exist within the check symbols themselves. These check symbols are also
sent
across the noisy medium and stand an equal chance of being corrupted
as one of the information symbols. Therefore, in the DVB example, the eight
symbol errors that can be corrected can be anywhere within the entire
codeword, including within the check symbols that were appended to the
message at the encoder.
Designing with megafunctions
The ability to incorporate large complex functions in PLDs is relatively
new. PLDs are rapidly increasing in both density and performance while
their cost has
plummeted, allowing functions such as RS codecs to be
efficiently implemented in programmable logic. This new capability,
combined with the increasing time-to-market pressure, has caused the use of
third-party megafunctions in PLDs to increase dramatically. These types of
functions have long been used in gate array designs, both for a
time-to-market advantage and to have the luxury of using a previously
verified function. Only with these recent increases in PLD density and
performance have the same
functions crossed over into PLD designs.
Needed parameters for RS codecs
Assuming the decision has been made to use megafunctions and to incorporate
them into a PLD, the exact system requirements must then be determined. In
the case of a RS codec, this would mean knowing which RS code is to be
implemented (number of total symbols per codeword, number of check symbols
per codeword), as well as values such as the field polynomial, generator
polynomial, and the number of bits per symbol. Once these
values are known
(many common implementations, such as the DVB example discussed, define all
of these values as part of the specification), the appropriate encoder or
decoder can be built. The RS codec we discussed allows these values to
merely be passed to the codec generator as parameters, and the generator
will then create the appropriate final design. Next, the design is compiled
into the target architecture and can then be evaluated for performance.
DVB implementation
As discussed earlier,
the DVB specification itself determines all of the
values that need to be passed to the core generator. Those specifications
are shown in Table 1.
There are two core generator executables: ENCRS.EXE
and DECRS.EXE, both of which run in a DOS shell. Running either one with no
arguments produces a list of needed parameters as follows:
C: >ENCRS
Encoder parameters:
n
: (message length)
r
: (number of check symbols)
m
: (field size)
field
: (field
polynomial)
genstart
: (first root of generator polynomial.)
Running ENCRS.EXE with the appropriate DVB parameters as shown:
C: >ENCRS 204 16 8 285 0
This equation generates the entire encoder design and also generates a
test vector file to allow a quick simulation of the encoder.
The decoder
generator has the same parameters, so producing a matching DVB decoder
means merely running DECRS.EXE with the same parameters, as shown:
C: >DECRS 204 16 8 285 0
Like
the encoder, this generates not only the entire decoder design
according to the parameters entered, but also generates a test vector file
for simulation.
As shown here, any RS codec can be generated very
easily, and the designer is able to specify all pertinent parameters. Once
the designs have been generated, they can be compiled into the target
device (in this case an Altera FLEX 10K device) and evaluated for
performance. The time required to generate a particular chosen RS code
implementation is
literally a few seconds, and since all RS code parameters
are user-selectable, any desired RS encoder or decoder can be generated
from these utilities.
Target devices
The RS cores are targeted at Altera PLDs, with the actual source code
written in AHDL, Alteras proprietary hardware description language.
The encoder and decoder functions differ radically in complexity, and
therefore also differ in device requirements. The encoder can be placed
into any Altera FLEX device family (FLEX 6000,
FLEX 8000, or FLEX 10K),
which are all SRAM-based devices with a look-up table as the basic building
block. The de- coder, however, uses on-board memory to store intermediate
values during the decoding process, and therefore is targeted exclusively
at the Altera FLEX 10K family.
Codec performance and device utilization
The DVB RS encoder and decoder have performance equal to and often better
than the dedicated chip solutions currently available. This is another
reason why programmable logic is
becoming a popular choice for RS
implementations, as in some cases it is the dedicated RS device that is the
performance bottleneck in the system. Performance for RS encoders and
decoders is generally expressed in millions of symbols per second (Msps),
and well be evaluating the performance of several RS encoder and
decoder functions implemented in Altera devices.
Encoder performance
RS encoding is a real-time function. As data is clocked in to the encoder,
it is immediately clocked out
on the same clock edge with no change in the
data. Internally, the data is fed back into the encoder to generate the
check symbols, and as soon as the appropriate number of information symbols
have been clocked in to the encoder (in the case of DVB this would be 188
symbols), the check symbols calculated from this data are clocked out.
Because this encoding is real-time, and also because the encoder operates
on one symbol per clock, the effective performance is measured by how fast
the encoder can be
clocked. The performance of the DVB encoder, as well as
that of the RS (255,245) encoder, are shown in Table 2. All compilations
for both the encoder and decoder were performed on a 400-MHz Pentium II
machine running Windows NT, using Alteras MAX+Plus II development
software.
Decoder performance
RS decoding is a far more complex operation than encoding. The entire
received codeword must be input to the decoder before the decoding process
can begin. Once begun, an additional latency period is
required, during
which time the codeword is analyzed. The corrected codeword is then output.
Because of this latency period, determining the performance of the decode
is not as straightforward as with the encoder. The effective throughput of
a RS decoder is a combination of the number of clock cycles required to
locate and correct errors after the codeword has been received and the
speed at which the design can be clocked. Knowing the latency and clock
speed allows the user to determine how many symbols per
second may be
processed by the decoder. In the RS core, there are two RS decoder choices:
a high-speed decoder and a low-speed decoder. The trade-off is that the
low-speed decoder is usually approximately 20% smaller in device
utilization. Note that both decoders operate at the same clock rate, but
the low-speed decoder has a longer latency period, resulting in a slower
effective symbol rate. The performance of the DVB decoder example in both
the high-speed and low-speed versions is shown in Table 3, as
well as for
a RS (255,245) code. This shows that as the number of check symbols de-
creases, the complexity of the decoder decreases, resulting in a smaller
design and an increase in performance.
Cost and performance
Improvements in both programmable logic architecture and process technology
have allowed Altera to produce new devices that are often on cost parity
with off-the-shelf dedicated devices. Some of the commonly used dedicated
solutions are from LSI Logic and Advanced Hardware
Architectures (AHA).
Many of these are available as either separate encoders and decoders, and
sometimes have both in one package.
Several years ago, programmable
devices were priced on a per gate basis at about fifty gates/dollar.
Currently, Altera devices have reached the 1,000 gates/dollar threshold and
should reach the 5,000 gates/dollar threshold by the end of the year. This
high level of integration has allowed complex functions such as the RS
codecs discussed in this article to be implemented
cost-effectively in
programmable logic. For example, the DVB encoder that uses 249 logic cells
can be implemented in less than 25% of an Altera EPF6016 device, leaving
over 1,000 logic cells available for complementary functions. This device
currently costs less than $20.00.
Regarding performance, the same
improvements that have driven the cost of programmable logic down have
resulted in greatly increased performance. Process migrations, along with
improvements in device architecture and synthesis tools, have allowed
designs that could only run at 20 MHz several years ago to run at well over
60 MHz today. This means that as even faster devices become available, the
core can be placed in these new devices to increase design performance. As
seen in the previous tables, the cores vary in performance based upon the
targeted RS code, but DVB encoders at 60 Msps and decoders at 44 Msps make
programmable logic a viable solution for all but the highest end
needs.
Even if the choice is to use programmable logic, the next
choice
is still whether to design the RS core in-house or to purchase it from a
third party. Designing from scratch can take up to 8 man-weeks for an
encoder and 26 man-weeks for a decoder.
Integration of complimentary functions
In a real-life RS coding implementation, functions that tend to reside on
either side of the RS encoder or decoder are often implemented in
programmable logic. One function that often resides after a RS encoder (and
is often required by the transmission specification) is
an interleaver. The
task of an interleaver is to scramble the symbols in several RS codewords
before transmission, effectively spreading any burst error that occurs
during transmission over several codewords. Spreading this burst error over
several codewords (where it may have occurred all within one codeword
without interleaving), increases the chance of each codeword being able to
correct all of its induced errors. The interleaver scrambles the codewords
and writes them into some type of memory prior to
transmission. This
function is easily and often implemented in programmable logic, even when a
dedicated RS codec is being used. The advantage to using a PLD is that both
the RS function and the interleaver can be easily integrated, saving board
real estate.
Another common function implemented along with a RS codec
is a Viterbi codec, which is a more accurate type of ECC when errors are
more evenly distributed in the data. While the RS effectively corrects
burst errors, the Viterbis strength is the
ability to correct
sporadic errors. Both methods are often combined (by specification) to
produce a very robust error correction scheme. Viterbi codecs can also be
implemented in programmable logic, allowing further integration of the ECC
scheme into a single device.
A further level of integration can be achieved when a particular system
must be able to support multiple RS codes. This situation is common in
satellite modem communications, where the designer may wish to target both
an IntelSat and a DVB
specification. Whereas two devices would be required
with a dedicated solution (one for each specification), the programmable
solution can merely have a version of each RS code within the same PLD, and
allow the user to multiplex between the two at any time or even run both of
them in parallel if necessary.
Robust error control
RS coding provides a robust error control method for many common types of
data transfer mediums, particularly those that are one-way or noisy and
sure to produce
errors. RS codes are already the standard for a large
number of satellite transmission specifications, as well as for compact
disc technology. The growing usefulness of RS codes is likely to
increase.
Programmable logic technology also continues to grow. Today,
for many large complex functions such as RS codes, programmable logic is
not only a viable solution, but the best solution. The huge strides that
have been made in the last five years can be expected to continue, with the
ability to incorporate even
larger and more complex functions into
programmable logic at increas- ed levels of performance. These larger,
faster devices, combined with the growing availability of intellectual
property targeted at programmable logic, will provide great benefits to
systems designers who should see higher performance at lower cost.
Martin Langhammer is a field applications engineer with Kaytronics in
Toronto, Canada. His background includes IP development for both
programmable logic and ASICs, principally DSP
functions for communications
and image processing. He received his BSEE from the University of
Toronto.
Wayne Turner is currently a field application engineer for Altera
Corporation and is based in Phoenix, AZ. His background includes
programmable logic and board design, primarily for image processing
and communications applications. He has also been involved in the
development and analysis of active-matrix LCD technology. He received
a BSEE from Arizona State University.