Chapter 5: Error-Correcting Codes

Real channels corrupt data. Error-correcting codes add structured redundancy so that receivers can not only detect but fix errors. From Richard Hamming's 1950 discovery that a mere 3 parity bits can protect 4 data bits, to Reed-Solomon codes guarding spacecraft telemetry, to the modern algebraic machinery underpinning every QR code and hard driveβ€”this chapter builds the theory from first principles.

Hamming Distance & Sphere Packing

The Hamming distance \( d(x, y) \) between two binary strings is the number of positions in which they differβ€”equivalently, the weight of their XOR:

\( d(x, y) = \mathrm{wt}(x \oplus y) = \sum_{i=1}^n |x_i - y_i| \)

Minimum Distance & Error Correction

A code \( \mathcal{C} \) with minimum distance \( d_{\min} \):

  • Detects up to \( d_{\min} - 1 \) errors
  • Corrects up to \( t = \lfloor (d_{\min} - 1)/2 \rfloor \) errors
  • Corrects \( t \) errors and detects \( t+1 \) simultaneously if \( d_{\min} \geq 2t + 2 \)

Nearest-neighbour (minimum distance) decoding: choose codeword \( \hat{c} = \arg\min_{c \in \mathcal{C}} d(r, c) \).

The Sphere-Packing Bound (Hamming Bound)

A binary \( [n, k, d] \) code with \( 2^k \) codewords, each surrounded by a non-overlapping Hamming sphere of radius \( t \), satisfies:

\( 2^k \sum_{i=0}^{t} \binom{n}{i} \leq 2^n \)

Codes achieving equality are called perfect codes. Hamming codes are perfect: \( 2^{n-k} = n+1 \) (e.g., \( 2^3 = 8 = 7+1 \)for Hamming(7,4)).

Sphere Packing in Hamming Space β€” 2 codewords with radius-1 spheres (not overlapping when \( d_{\min} \geq 3 \))

c₁Hamming sphere t=1cβ‚‚Hamming sphere t=1d(c₁,cβ‚‚) β‰₯ 2t+1 = 3(spheres don't overlap)r (received)d=1 β†’ decode to c₁

Linear Codes: Generator & Parity-Check Matrices

A linear code \( [n, k, d] \) is a\( k \)-dimensional subspace of \( \mathbb{F}_2^n \). The \( 2^k \)codewords are exactly the rows of the coset \( \{mG : m \in \mathbb{F}_2^k\} \).

Generator Matrix \( G \)

A \( k \times n \) matrix whose rows form a basis for \( \mathcal{C} \). Encoding a message \( m \in \mathbb{F}_2^k \):

\( c = mG \pmod{2} \)

In systematic form \( G = [I_k \mid P] \), the first \( k \)bits of \( c \) are the message bits; the last \( n-k \) are parity bits.

Parity-Check Matrix \( H \)

An \( (n-k) \times n \) matrix satisfying \( GH^T = 0 \). For any codeword \( c \in \mathcal{C} \):

\( Hc^T = 0 \pmod{2} \)

The syndrome \( s = Hr^T \) of a received word \( r \)identifies the error pattern. If \( s = 0 \), no detectable error; otherwise\( s \) points to the corrupted bit.

Hamming(7,4): The Classic Example

Parameters: \( n=7, k=4, d_{\min}=3 \), rate \( R = 4/7 \approx 0.571 \)

G = [1 0 0 0 | 1 1 0
Β Β Β Β  0 1 0 0 | 1 0 1
Β Β Β Β  0 0 1 0 | 0 1 1
Β Β Β Β  0 0 0 1 | 1 1 1]

Syndrome decoding: compute \( s = Hr^T \), look up in table to find error position.

H = [1 1 0 1 | 1 0 0
Β Β Β Β  1 0 1 1 | 0 1 0
Β Β Β Β  0 1 1 1 | 0 0 1]

The columns of \( H \) are the binary representations of 1–7. The syndrome \( s \)is the binary address of the erroneous bit β€” an elegant construction.

Reed-Solomon Codes

Reed-Solomon (RS) codes (1960) operate over finite fields \( \mathbb{F}_q \) and are Maximum Distance Separable (MDS): they achieve the Singleton bound \( d_{\min} = n - k + 1 \), the maximum possible minimum distance for given \( n \) and \( k \).

Construction

Choose distinct evaluation points \( \alpha_1, \ldots, \alpha_n \in \mathbb{F}_q \). A message \( (m_0, \ldots, m_{k-1}) \) defines a polynomial:

\( f(x) = m_0 + m_1 x + \cdots + m_{k-1} x^{k-1} \)

The codeword is the evaluations: \( c = (f(\alpha_1), \ldots, f(\alpha_n)) \). Since a nonzero polynomial of degree \( k-1 \) has at most \( k-1 \)roots, any two distinct codewords differ in at least \( n - k + 1 \) positions.

Applications

  • CDs/DVDs/Blu-ray: RS(255,251) corrects burst errors from scratches
  • QR codes: Up to 30% of data can be restored via RS error correction
  • Deep space: Voyager, Cassini use RS for telemetry over billions of km
  • RAID storage: RS parity enables recovery from multiple disk failures
  • DSL/5G: RS as outer code in concatenated schemes

The Gilbert-Varshamov Bound

A lower bound on the maximum code size: there exist binary \( [n, k, d] \) codes satisfying:

\( \frac{k}{n} \geq 1 - H_2\!\left(\frac{d-1}{n}\right) \)

where \( H_2(\delta) = -\delta \log_2 \delta - (1-\delta)\log_2(1-\delta) \)is the binary entropy function. The GV bound is existentialβ€”random codes achieve it, but constructing explicit codes meeting this bound for all parameters remains an open problem. LDPC codes (Chapter 6) provably achieve it under certain conditions.

Python: Hamming(7,4) Encoder & Decoder

Implement Hamming(7,4) from scratch using its generator and parity-check matrices. Encode messages, inject single-bit errors, perform syndrome decoding to correct them, and measure BER before and after correction across a range of channel flip probabilities.

Python
script.py212 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server