In order for third-generation (3G) wireless systems to take off, design engineers must once again turn to advanced error correction schemes. 3G wireless systems will no longer be tasked with only delivering voice services. Rather they will be expected to deliver video, data, audio, and voice services over the same air link. But to effectively achieve this goal, designers must first deal with errors in the radio channel. If these errors go undetected or uncorrected, performance can degrade, response time can weaken, and human intervention can increase.
Introduced back in 1993, turbo coding is seen by many in the wireless sector as the forward-error-correction (FEC) technology of choice for 3G applications. In fact, many of the International Telecommunication Union (ITU) specifications recommend using turbo coding for 3G architectures.
While offering a great deal of promise, however, turbo coding is a difficult concept to understand and implement. In this article, we'll lay out the key components required to develop a turbo coding solution. Specifically, we'll look at the turbo encoder and turbo decoder. We'll also explore the implementation issues involved with implementing the turbo coders in a system design.
The Turbo Encoder
Turbo codes, which were presented to the coding community in 19932, represent the most important breakthrough in coding since Ungerboeck's invention of Trellis codes in 1982.3. While Ungerboeck's work eventually led to coded modulation schemes, which offer near channel capacity for bandlimited channels, turbo codes offer near channel capacity for power limited channels such as wireless channels.
To better illustrate the turb-coding scheme, let's look at a specific turbo encoder/decoder implementation. The turbo encoder (Figure 1) consists of 2 recursive systematic convolutional (RSC) codes separated by an interleaver. In this turbo encoder architecture, the RSCs are concatenated in parallel fashion.

Figure 1: Block diagram of a general turbo encoder.
The constituent codes are RSCs because they combine the properties of non-systematic codes and systematic codes. In the encoder architecture displayed in Figure 1, 2 RSCs are identical. Generator matrices represent the RSCs: g1(D)=1+D+D^2+D^3+D^4 (1 1 1 1 1 ), g2(D)=1+D^4 (1 0 0 0 1).
g1 corresponds to the feedback polynomial and g2 corresponds to the output polynomial. Note that the first element in g1 corresponds to the input bit dk, so it should be always 1 otherwise it would mean that the RSC is not accepting any input. In the implementation highlighted in Figure 1, the length of the RSC is restricted to 5, since RSCs with large memory length do not improve the performance so much.
In a turbo coding architecture, one block of input bits contains N-Mem information bits and Mem tail bits. A permuter can be included in the turbo encoder architecture to take each incoming block of N bits and rearrange them in some particular fashion. For large block sizes, designers can use pseudo random interleavers to rearrange bits.
In the architecture shown in Figure 1 a pseudo random interleaver is employed. The interleaver is a permutation map. To create the pseudo random interleaver map: generate n random numbers and rearrange them in ascending order or descending order.
When bits are being passed through the RSC, the RSC changes from one state to another. Suppose Mem is the memory length of the RSC. Then at any time the RSC can be in any one of the 2em states.
In the decoding algorithm, which we will see shortly, it is assumed that at the end of encoding one block the RSC will come to the state 0, i.e. the state corresponding to zeroes in all the shift registers. Normally the RSC does not come to 0 state after one block. So we have to pass additional bits, called tail bits, to make the RSC end in state 0.
The number of tail bits to be passed is Mem. So, first we should pass the N-Mem information bits with the switch in Figure 1 in position 1 and then we should pass the Mem tail bits with the switch in position 2. After we transmit the Mem tail bits the RSC1 will come to state 0. The input to the RSC2 will be the interleaved bit stream of the input to the RSC1, which is N-Mem
information bits and Mem tail bits.
In addition to providing an interleaver, the turbo encoding architecture could also include a puncturer. The role of the puncturer is to periodically delete selected bits to reduce coding overhead. Without the puncturer 3N bits (N systematic bits, N parity bits from each RSC) would be transmitted for every N bits. In the encoder described above, a puncturer is not included.
The Turbo Decoder
The architecture of the turbo decoder is shown in Figure 2. As indicated by the presence of a feedback path, the decoder operates in an iterative manner. Each full iteration consists of 2 half iterations, one for each constituent code. The timing of the decoder is such that the decoder 1 operates during the first half iteration and decoder 2 operates during the second half iteration.

Figure 2: Block diagram of a typical tubo decoder
In the decoder architecture, there are 2 decoders corresponding to 2 RSCs. The inputs to the first decoder are the corrupted systematic bit array xk, the corrupted parity bit stream y1k from the first RSC, and the deinterleaved extrinsic information from the second decoder. The inputs to the second decoder are the corrupted interleaved systematic bit stream, the corrupted parity bit stream y2k from the second RSC, and the interleaved extrinsic information from the first decoder.
The heart of the iterative decoding procedure is the use, in each component decoder, of an algorithm that computes the a posteriori probability (APP) of the information symbols or, more generally, a reliability value for each information symbol (and/or code symbol). The sequence of reliability values generated by a decoder is passed to the other one. In this way, each decoder takes advantage of the "suggestions" of the other one.
To improve the correctness of its decisions, each decoder has to be fed with information that does not originate from itself. The concept of extrinsic information was introduced to identify the component of the general reliability value, which depends on redundant information introduced by the considered constituent code.
A natural reliability value, in the binary case, is the logarithm likelihood ratio (LLR) defined as:
where the word input refers to all the decoder inputs.
The LLR may be exactly computed using the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm, which allows one to compute the APPs as:
where "i" is +/- 1.
The BCJR algorithm is optimal to generate the sequence of APPs, but its computational complexity is large compared to that of Viterbi algorithm. To solve this problem, designers can turn to the soft-output Viterbi algorithm (SOVA) to provide reliability information. This reliability data is interpreted as an approximation of the LLRs.
For both the BCJR algorithm and SOVA, there exist essentially two methods to process the extrinsic information received by each decoder. In the first method, the extrinsic information at the input of a decoder is modeled as the output of an additive white Gaussian noise (AWGN) meta-channel. In the second method, the extrinsic information is used to update the "a priori" probabilities, which are used in the next decoding step, in the sense that a posteriori probabilities computed by a decoder become a priori probabilities for the other one.
The only difference between the BCJR and SOVA algorithms lies in a multiplicative factor used in the metric computation. In the decoder described here, we have employed a BCJR decoder, which is a MAP decoder with the second method above to process extrinsic information.
The Log-MAP algorithm
Now that we've laid out the decoder, let's look at how we can implement this iterative decoder algorithm. During implementation, each of the two decoders uses the MAP algorithm in the log domain. After every half iteration, extrinsic information is computed and it is given as input to the next decoder. The formula for computing the extrinsic information is given as:
where:
where
f(i,m) is the next state given an input i and state m
where vim is the coded bit given the input bit dk=i and the state of the RSC ISk=m. In addition, Rk=(xk,yk) is the received symbol at time k.
In order to compute the extrinsic information we need to compute alpha, beta, and delta. Alpha is the forward state metric and is calculated as:
where b(j,m) is the state going backwards in time from state m on the previous branch corresponding to input j.
Beta is the backward state metric and is calculated as:
Delta is the branch metric and is calculated as:

where χk is a constant and ζik=Pr(dk=i) by definition.
The BCJR algorithm is implemented in the log domain using the following approximation.
The correction term fc(y-x) can be implemented using the log and exp functions in C or by using a large lookup table. This is called the log MAP algorithm. Log MAP algorithm, though most complex, offers the best BER performance.
When the correction term is ignored in the computation, the algorithm is called the max-Log-MAP algorithm. The max-Log-MAP algorithm is the least complex of all the algorithms (like constant log MAP and linear log MAP).
Initially, we used the log MAP algorithm and found that the amount of computation is more and hence the decoding speed is less. Therefore, we implemented the max -Log MAP algorithm. The formulas for alpha, beta and delta using the max-Log-MAP algorithm are given below. Let:




.
Now, lets look at how we can efficiently calculate alpha, beta, and delta. For calculating the delta, we need not declare a 3D array. For a particular value of k, Dki,m will take only either of the 4 values which are given below as:

LLR (k) is the extrinsic information supplied by the previous decoder. For decoder 2, y1(k) will be replaced by y2(k) and x(k) will be replaced by the interleaved bit in the above expressions.
where:

The backward recursion is initialized with B1N+1=0 and BmN+1=-α for m not equal to 1. Then Bmk is calculated using equations (14) and (15). Next, the M values Bmk(m=1 to M) are normalized by subtracting the maximum of the M values from the original M values.
Forward Recursion
Unlike the computation of beta, alpha need not be declared as a k*m array. Only the partial path metrics for 2 stages of the trellis (k and k-1) is maintained.
The forward recursion is initialized with A11=0 and Am1=-α for m not equal to 1. Then Amk is calculated using equations 12 and 13.
As soon as we compute Amk, we can calculate extrinsic information using equations 18 and 19. So there is no need of an extra loop for extrinsic info calculation. Just as we did for the computation of Bmk, Amk will also be normalized for every k. The normalized values will be used in the computation of the extrinsic information. Unless normalization is done, we will get overflows for large block sizes.
For the second MAP decoder, an interleaved version of the corrupted systematic bits are taken as input and the corrupted parity bits from the second encoder y2(k) are used.
Let Le12[k] be the extrinsic information output of the first decoder. The permuted Le12[k] is used in place of LLR[k] for the second decoder for the next half iteration. Let Le21_d[k] be the extrinsic information output of the second decoder. Then the de-interleaved version of Le21_d[k] (say Le21[k]) is used in place of LLR[k] for the 1st decoder for the next half iteration. Note that before starting the first half iteration LLR (k) should be initialized to zero for all k.
The LLR of kth decoded bit after any iteration is:

If the LLR is positive the bit is decoded as 1. If it's negative, the bit is decoded as 0.
Wrap up
The convergence properties of the iterative decoding algorithm used in turbo-decoded systems should be of concern to design engineers. While the turbo decoder may usually converge, neither convergence to the correct maximum likelihood (ML) solution or convergence to a stable solution can be guaranteed, particularly at low SNR. It is, however, clear that good interleaver design can minimize the number of frames for which the decoder converges to an incorrect solution.
References
- C.E.Shannon, "A mathematical theory of communication", Bell System Technical Journal, 27:379-423,623-656,October 1948.
- C.Berrou, A. Glavieux, and P.Thitimajshima, "Near Shannon limit error-correcting coding and decoding: Turbo codes", Proceedings of the 1993 International Conference on Communications, pages 1064-1070,1993.
- H.R.Sadjadpour, et al, "Interleaver design for short block length Turbo codes".
- Steven S.Pietrobon, "Implementation and Performance of a Turbo/MAP Decoder".
- Claude Berrou, Alain Glavieux, "Near Optimum Error Correcting Coding and Decoding : Turbo Codes", IEEE Transactions on Communication, Vol.44, No.10, October 1996.
- Andrew C.Reid et al, "Convergence and Errors in Turbo-Decoding", IEEE Transactions on Communications, Vol.49, No.12, December 2001.
Rajagopalan Ramamoorthy is a DSP software engineer at Patni Computer Systems Ltd. He received his B.Tech in Electrical Engg and M.Tech in communications and signal pProcessing both from Indian Institute of Technology in Mumbai. Rajagopalan can be reached at rajagopalan.ramamoorthy@patni.com.
Nalin Pithwa is a project leader for DSP software at Patni Computer Systems Ltd.. He received his B.E. in Electrical Eng. from the University of Mumbai, and an M.S. in Applied Math from University of
Florida, Gainesville. Nalin can be reached at nalin.pithwa@patni.com.