# MATHEMATICS

5

Coding Theory

Author: Kenneth H. Rosen, AT&T Laboratories.

Prerequisites: The prerequisites for this chapter are the basics of logic, set theory, number theory, matrices, and probability. See Sections 1.1, 2.1, 2.2, 3.4–3.7, and 6.1 of Discrete Mathematics and Its Applications.

Introduction The usual way to represent, manipulate, and transmit information is to use bit strings, that is, sequences of zeros and ones. It is extremely difficult, and often impossible, to prevent errors when data are stored, retrieved, operated on, or transmitted. Errors may occur from noisy communication channels, electrical interference, human error, or equipment error. Similarly, errors are introduced into data stored over a long period of time on magnetic tape as the tape deteriorates.

It is particularly important to ensure reliable transmission when large com- puter files are rapidly transmitted or when data are sent over long distances, such as data transmitted from space probes billions of miles away. Similarly, it is often important to recover data that have degraded while stored on a tape. To guarantee reliable transmission or to recover degraded data, techniques from coding theory are used. Messages, in the form of bit strings, are encoded by translating them into longer bit strings, called codewords. A set of codewords

73

74 Applications of Discrete Mathematics

is called a code. As we will see, we can detect errors when we use certain codes. That is, as long as not too many errors have been made, we can de- termine whether one or more errors have been introduced when a bit string was transmitted. Furthermore, when codes with more redundancy are used, we can correct errors. That is, as long as not too many errors were introduced in transmission, we can recover the codeword from the bit string received when this codeword was sent.

Coding theory, the study of codes, including error detecting and error correcting codes, has been studied extensively for the past forty years. It has become increasingly important with the development of new technologies for data communications and data storage. In this chapter we will study both error detecting and error correcting codes. We will introduce an important family of error correcting codes. To go beyond the coverage in this chapter and to learn about the latest applications of coding theory and the latest technical developments, the reader is referred to the references listed at the end of the chapter.

Error Detecting Codes A simple way to detect errors when a bit string is transmitted is to add a parity check bit at the end of the string. If the bit string contains an even number of 1s we put a 0 at the end of the string. If the bit string contains an odd number of 1s we put a 1 at the end of the string. In other words, we encode the message x1x2 . . . xn as x1x2 . . . xnxn+1, where the parity check bit xn+1 is given by

xn+1 = (x1 + x2 + . . . + xn) mod 2.

Adding the parity check bit guarantees that the number of 1s in the extended string must be even. It is easy to see that the codewords in this code are bit strings with an even number of 1s.

Note that when a parity check bit is added to a bit string, if a single error is made in the transmission of a codeword, the total number of 1s will be odd. Consequently, this error can be detected. However, if two errors are made, these errors cannot be detected, since the total number of 1s in the extended string with two errors will still be even. In general, any odd number of errors can be detected, while an even number of errors cannot be detected.

Example 1 Suppose that a parity check bit is added to a bit string before it is transmitted. What can you conclude if you receive the bit strings 1110011 and 10111101 as messages?

Solution: Since the string 1110011 contains an odd number of 1s, it cannot be a valid codeword (and must, therefore, contain an odd number of errors).

Chapter 5 Coding Theory 75

On the other hand, the string 10111101 contains an even number of 1s. Hence it is either a valid codeword or contains an even number of errors.

Another simple way to detect errors is to repeat each bit in a message twice, as is done in the following example.

Example 2 Encode the bit string 011001 by repeating each bit twice.

Solution: Repeating each bit twice produces the codeword 001111000011.

What errors can be detected when we repeat each bit of a codeword twice? Since the codewords are those bit strings that contain pairs of matching bits, that is, where the first two bits agree, the third and fourth bits agree, and so on, we can detect errors that change no more than one bit of each pair of these matching bits. For example, we can detect errors in the second bit, the third bit, and the eighth bit of when codewords have eight bits (such as detecting that 01101110, received when the codeword 00001111 was sent, has errors). On the other hand, we cannot detect an error when the third and fourth bit are both changed (such as detecting that 00111111, received when the codeword 00001111 was sent, has errors).

So far we have discussed codes that can be used to detect errors. When errors are detected, all we can do to obtain the correct codeword is to ask for retransmission and hope that no errors will occur when this is done. However, there are more powerful codes that can not only detect but can also correct errors. We now turn our attention to these codes, called error correcting codes.

Error Correcting Codes We have seen that when redundancy is included in codewords, such as when a parity check bit is added to a bit string, we can detect transmission errors. We can do even better if we include more redundancy. We will not only be able to detect errors, but we will also be able to correct errors. More precisely, if sufficiently few errors have been made in the transmission of a codeword, we can determine which codeword was sent. This is illustrated by the following example.

Example 3 To encode a message we can use the triple repetition code. We repeat a message three times. That is, if the message is x1x2x3, we encode it as x1x2x3x4x5x6x7x8x9 where x1 = x4 = x7, x2 = x5 = x8, and x3 = x6 = x9. The valid codewords are 000000000, 001001001, 010010010, 011011011, 100100100, 101101101, 110110110, and 111111111.

76 Applications of Discrete Mathematics

We decode a bit string, which may contain errors, using the simple majority rule. For example, to determine x1, we look at x1, x4, and x7. If two or three of these bits are 1, we conclude that x1 = 1. Otherwise, if two or three of these bits are 0, we conclude that x1 = 0. In general, we look at the three bits corresponding to each bit in the original message. We decide that a bit in the message is 1 if a majority of bits in the string received in positions corresponding to this bit are 1s and we decide this bit is a 0 otherwise. Using this procedure, we correctly decode the message as long as at most one error has been made in the bits corresponding to each bit of the original message.

For example, when a triple repetition code is used, if we receive 011111010, we conclude that the message sent was 011. (For instance, we decided that the first bit of the message was 0 since the first bit is 0, the fourth bit is 1, and the seventh bit is 0, leading us to conclude that the fourth bit is wrong.)

To make the ideas introduced by the triple repetition code more precise we need to introduce some ideas about the distance between codewords and the probability of errors. We will develop several important concepts before returning to error correcting codes.

Hamming Distance There is a simple way to measure the distance between two bit strings. We look at the number of positions in which these bit strings differ. This approach was used by Richard Hamming* in his fundamental work in coding theory.

Definition 1 The Hamming distance d(x,y) between the bit strings x = x1x2 . . . xn and y = y1y2 . . . yn is the number of positions in which these strings differ, that is, the number of i (i = 1, 2, . . . , n) for which xi �= yi.

Note that the Hamming distance between two bit strings equals the number of changes in individual bits needed to change one of the strings into the other.

* Richard Hamming (1915–1998) was one of the founders of modern coding theory. He was born in Chicago and received his B.S. from the University of Chicago, his M.A.

from the University of Nebraska, and his Ph.D. from the University of Illinois in 1942.

He was employed by the University of Illinois from 1942 to 1944 and the University

of Louisville from 1944 to 1945. From 1945 until 1946 he was on the staff of the

Manhattan Project in Los Alamos. He joined Bell Telephone Laboratories in 1946,

where he worked until 1976. His research included work in the areas of coding theory,

numerical methods, statistics, and digital filtering. Hamming joined the faculty of the

Naval Postgraduate School in 1976. Among the awards he won are the Turing Prize

from the ACM and the IEEE Hamming Medal (named after him).

Chapter 5 Coding Theory 77

We will find this observation useful later.

Example 4 Find the Hamming distance between the bit strings 01110 and 11011 and the Hamming distance between the bit strings 00000 and 11111.

Solution: Since 01110 and 11011 differ in their first, third, and fifth bits, d(01110, 11011) = 3. Since 00000 and 11111 differ in all five bits, we conclude that d(00000, 11111) = 5.

The Hamming distance satisfies all the properties of a distance function (or metric), as the following theorem demonstrates.

Theorem 1 Let d(x,y) represent the Hamming distance between the bit strings x and y of length n. Then

(i) d(x,y) ≥ 0 for all x, y (ii) d(x,y) = 0 if and only if x = y

(iii) d(x,y) = d(y,x) for all x, y

(iv) d(x,y) ≤ d(x, z) + d(z,y) for all x, y, z. Proof: Properties (i), (ii), and (iii) follow immediately from the definition of the Hamming distance. To prove (iv), we use the fact that d(x,y) is the number of changes of bits required to change x into y. Note that for every string z of length n the number of changes needed to change x into y does not exceed the number of changes required to change x into z and to then change z into y.

How can the Hamming distance be used in decoding? In particular, sup- pose that when a codeword x from a code C is sent, the bit string y is received. If the transmission was error-free, then y would be the same as x. But if errors were introduced by the transmission, for instance by a noisy line, then y is not the same as x. How can we correct errors, that is, how can we recover x?

One approach would be to compute the Hamming distance between y and each of the codewords in C. Then to decode y, we take the codeword of minimum Hamming distance from y, if such a codeword is unique. If the distance between the closest codewords in C is large enough and if sufficiently few errors were made in transmission, this codeword should be x, the codeword sent. This type of decoding is called nearest neighbor decoding.

Example 5 Use nearest neighbor decoding to determine which code word

78 Applications of Discrete Mathematics

was sent from the code C = {0000, 1110, 1011} if 0110 is received. Solution: We first find the distance between 0110 and each of the codewords. We find that

d(0000, 0110) = 2,

d(1110, 0110) = 1,

d(1011, 0110) = 3.

Since the closest codeword to 0110 is 1110, we conclude that 1110 was the codeword sent.

Will nearest neighbor decoding produce the most likely codeword that was sent from a binary string that was received? It is not hard to see that it will if each bit sent has the same probability p of being received incorrectly and p < 1/2. We call a transmission channel with this property a binary symmetric channel. Such a channel is displayed in Figure 1.

Figure 1. A binary symmetric channel.

Example 6 Suppose that when a bit is sent over a binary symmetric channel the probability it is received incorrectly is 0.01. What is the probability that the bit string 100010 is received when the bit string 000000 is sent?

Solution: Since the probability a bit is received incorrectly is 0.01, the prob- ability that a bit is received correctly is 1 − 0.01 = 0.99. For 100010 to be received, when 000000 is sent, it is necessary for the first and fifth bits to be received incorrectly and the other four bits to be received correctly. The prob- ability that this occurs is

(0.99)4(0.01)2 = 0.000096059601.

We will now show that nearest neighbor decoding gives us the most likely codeword sent, so that it is also maximum likelihood decoding.

Chapter 5 Coding Theory 79

Theorem 2 Suppose codewords of a binary code C are transmitted using a binary symmetric channel. Then, nearest neighbor decoding of a bit string received produces the most likely codeword sent.

Proof: To prove this theorem we first need to find the probability that when a codeword of length n is sent, a bit string with k errors in specified positions is received. Since the probability each bit is received correctly is 1 − p and the probability each bit is received in error is p, it follows that the probability of k errors in specified positions is pk(1 − p)n−k. Since p < 1/2 and 1 − p > 1/2, it follows that

pi(1 − p)n−i > pj(1 − p)n−j

whenever i < j. Hence, if i < j, the probability that a bit string with i specified errors is received is greater than the probability that a bit string with j specified errors is received. Since is more likely that errors were made in fewer specified positions when a codeword was transmitted, nearest neighbor decoding produces the most likely codeword.

The Hamming distance between codewords in a binary code determines its ability to detect and/or correct errors. We need to make the following definition before introducing two key theorems relating to this ability.

Definition 2 The minimum distance of a binary code C is the smallest distance between two distinct codewords, that is,

d(C) = min{d(x,y)|x,y ∈ C,x �= y}.

Example 7 Find the minimum distance of the code

C = {00000, 01110, 10011, 11111}.

Solution: To compute the minimum distance of this code we will find the distance between each pair of codewords and then find the smallest such dis- tance. We have d(00000, 01110) = 3, d(00000, 10011) = 3, d(00000, 11111) = 5, d(01110, 10011) = 4, d(01110, 11111) = 2, and d(10011, 11111) = 2. We see that the minimum distance of C is 2.

Example 8 Find the minimum distance of the code

C = {000000, 111111}.

80 Applications of Discrete Mathematics

Solution: Since there are only two codewords and d(000000, 111111) = 6, the minimum distance of this code is 6.

The minimum distance of a code tells us how many errors it can detect and how many errors it can correct, as the following two theorems show.

Theorem 3 A binary code C can detect up to k errors in any codeword if and only if d(C) ≥ k + 1. Proof: Suppose that C is a binary code with d(C) ≥ k + 1. Suppose that a codeword x is transmitted and is received with k or fewer errors. Since the minimum distance between codewords is at least k + 1, the bit string received cannot be another codeword. Hence, the receiver can detect these errors.

Now suppose that C can detect up to k errors and that d(C) ≤ k. Then there are two codewords in C that differ in no more than k bits. It is then possible for k errors to be introduced when one of these codewords is transmitted so that the other codeword is received, contradicting the fact that C can detect up to k errors.

Theorem 4 A binary code C can correct up to k errors in any codeword if and only if d(C) ≥ 2k + 1. Proof: Suppose that C is a binary code with d(C) ≥ 2k + 1. Suppose that a codeword x is transmitted and received with k or fewer errors as the bit string z, so that d(x, z) ≤ k. To see that C can correct these errors, note that if y is a codeword other than x, then d(z,y) ≥ k + 1. To see this note that if d(z,y) ≤ k, then by the triangle inequality d(x,y) ≤ d(x, z) + d(z,y) ≤ k + k = 2k, contradicting the assumption that d(C) ≥ 2k + 1.

Conversely, suppose that C can correct up to k errors. If d(C) ≤ 2k, then there are two codewords that differ in 2k bits. Changing k of the bits in one of these codewords produces a bit string that differs from each of these two codewords in exactly k positions, thus making it impossible to correct these k errors.

Example 9 Let C be the code {00000000, 11111000, 01010111, 10101111}. How many errors can C detect and how many can it correct?

Solution: Computing the distance between codewords shows that the min- imum distance of C is 5. By Theorem 3, it follows that C can detect up to 5 − 1 = 4 errors. For example, when we use C to detect errors, we can de- tect the four errors made in transmission when we receive 11110000 when the codeword 00000000 was sent.

Chapter 5 Coding Theory 81

By Theorem 4, it follows that C can correct up to �(5 − 1)/2� = 2 errors. For example, when we use C to correct errors, we can correct the two er- rors introduced in transmission when we receive 11100000 when the codeword 11111000 was sent.

Perfect Codes To allow error correction we want to make the minimum distance between codewords large. But doing so limits how many codewords are available. Here we will develop a bound on the number of codewords in a binary code with a given minimum distance.

Lemma 1 Suppose x is a bit string of length n and that k is a nonnegative integer not exceeding n. Then there are

C(n, 0) + C(n, 1) + · · · + C(n, k). bit strings y of length n such that d(x,y) ≤ k (where d is the Hamming dis- tance).

Proof: Let i be a nonnegative integer. The number of bit strings y with d(x,y) = i equals the number of ways to select the i locations where x and y differ. This can be done in C(n, i) ways. It follows that there are

C(n, 0) + C(n, 1) + · · · + C(n, k) bit strings such that d(x,y) ≤ k.

We can describe the statement in Lemma 1 in geometric terms. By the sphere of radius k centered at x we mean the set of all bit strings y such that d(x,y) ≤ k. Lemma 1 says that there are exactly ∑ki=0 C(n, i) bit stings in the sphere of radius k centered at x.

Lemma 2 Let C be a binary code containing codewords of length n and let d(C) = 2k + 1. Then given a bit string y of length n, there is at most one codeword x such that y is in the sphere of radius k centered at x.

Proof: Suppose that y is in the sphere of radius k centered at two different codewords x1 and x2. Then d(x1,y) ≤ k and d(x2,y) ≤ k. By the triangle inequality for the Hamming distance this implies that

d(x1,x2) ≤ d(x1,y) + d(x2,y) ≤ k + k = 2k,