Chapter 5: Error-Correcting Codes
Real channels corrupt data. Error-correcting codes add structured redundancy so that receivers can not only detect but fix errors. From Richard Hamming's 1950 discovery that a mere 3 parity bits can protect 4 data bits, to Reed-Solomon codes guarding spacecraft telemetry, to the modern algebraic machinery underpinning every QR code and hard driveβthis chapter builds the theory from first principles.
Hamming Distance & Sphere Packing
The Hamming distance \( d(x, y) \) between two binary strings is the number of positions in which they differβequivalently, the weight of their XOR:
\( d(x, y) = \mathrm{wt}(x \oplus y) = \sum_{i=1}^n |x_i - y_i| \)
Minimum Distance & Error Correction
A code \( \mathcal{C} \) with minimum distance \( d_{\min} \):
- Detects up to \( d_{\min} - 1 \) errors
- Corrects up to \( t = \lfloor (d_{\min} - 1)/2 \rfloor \) errors
- Corrects \( t \) errors and detects \( t+1 \) simultaneously if \( d_{\min} \geq 2t + 2 \)
Nearest-neighbour (minimum distance) decoding: choose codeword \( \hat{c} = \arg\min_{c \in \mathcal{C}} d(r, c) \).
The Sphere-Packing Bound (Hamming Bound)
A binary \( [n, k, d] \) code with \( 2^k \) codewords, each surrounded by a non-overlapping Hamming sphere of radius \( t \), satisfies:
\( 2^k \sum_{i=0}^{t} \binom{n}{i} \leq 2^n \)
Codes achieving equality are called perfect codes. Hamming codes are perfect: \( 2^{n-k} = n+1 \) (e.g., \( 2^3 = 8 = 7+1 \)for Hamming(7,4)).
Sphere Packing in Hamming Space β 2 codewords with radius-1 spheres (not overlapping when \( d_{\min} \geq 3 \))
Linear Codes: Generator & Parity-Check Matrices
A linear code \( [n, k, d] \) is a\( k \)-dimensional subspace of \( \mathbb{F}_2^n \). The \( 2^k \)codewords are exactly the rows of the coset \( \{mG : m \in \mathbb{F}_2^k\} \).
Generator Matrix \( G \)
A \( k \times n \) matrix whose rows form a basis for \( \mathcal{C} \). Encoding a message \( m \in \mathbb{F}_2^k \):
\( c = mG \pmod{2} \)
In systematic form \( G = [I_k \mid P] \), the first \( k \)bits of \( c \) are the message bits; the last \( n-k \) are parity bits.
Parity-Check Matrix \( H \)
An \( (n-k) \times n \) matrix satisfying \( GH^T = 0 \). For any codeword \( c \in \mathcal{C} \):
\( Hc^T = 0 \pmod{2} \)
The syndrome \( s = Hr^T \) of a received word \( r \)identifies the error pattern. If \( s = 0 \), no detectable error; otherwise\( s \) points to the corrupted bit.
Hamming(7,4): The Classic Example
Parameters: \( n=7, k=4, d_{\min}=3 \), rate \( R = 4/7 \approx 0.571 \)
G = [1 0 0 0 | 1 1 0
Β Β Β Β 0 1 0 0 | 1 0 1
Β Β Β Β 0 0 1 0 | 0 1 1
Β Β Β Β 0 0 0 1 | 1 1 1]
Syndrome decoding: compute \( s = Hr^T \), look up in table to find error position.
H = [1 1 0 1 | 1 0 0
Β Β Β Β 1 0 1 1 | 0 1 0
Β Β Β Β 0 1 1 1 | 0 0 1]
The columns of \( H \) are the binary representations of 1β7. The syndrome \( s \)is the binary address of the erroneous bit β an elegant construction.
Reed-Solomon Codes
Reed-Solomon (RS) codes (1960) operate over finite fields \( \mathbb{F}_q \) and are Maximum Distance Separable (MDS): they achieve the Singleton bound \( d_{\min} = n - k + 1 \), the maximum possible minimum distance for given \( n \) and \( k \).
Construction
Choose distinct evaluation points \( \alpha_1, \ldots, \alpha_n \in \mathbb{F}_q \). A message \( (m_0, \ldots, m_{k-1}) \) defines a polynomial:
\( f(x) = m_0 + m_1 x + \cdots + m_{k-1} x^{k-1} \)
The codeword is the evaluations: \( c = (f(\alpha_1), \ldots, f(\alpha_n)) \). Since a nonzero polynomial of degree \( k-1 \) has at most \( k-1 \)roots, any two distinct codewords differ in at least \( n - k + 1 \) positions.
Applications
- CDs/DVDs/Blu-ray: RS(255,251) corrects burst errors from scratches
- QR codes: Up to 30% of data can be restored via RS error correction
- Deep space: Voyager, Cassini use RS for telemetry over billions of km
- RAID storage: RS parity enables recovery from multiple disk failures
- DSL/5G: RS as outer code in concatenated schemes
The Gilbert-Varshamov Bound
A lower bound on the maximum code size: there exist binary \( [n, k, d] \) codes satisfying:
\( \frac{k}{n} \geq 1 - H_2\!\left(\frac{d-1}{n}\right) \)
where \( H_2(\delta) = -\delta \log_2 \delta - (1-\delta)\log_2(1-\delta) \)is the binary entropy function. The GV bound is existentialβrandom codes achieve it, but constructing explicit codes meeting this bound for all parameters remains an open problem. LDPC codes (Chapter 6) provably achieve it under certain conditions.
Python: Hamming(7,4) Encoder & Decoder
Implement Hamming(7,4) from scratch using its generator and parity-check matrices. Encode messages, inject single-bit errors, perform syndrome decoding to correct them, and measure BER before and after correction across a range of channel flip probabilities.
Click Run to execute the Python code
Code will be executed with Python 3 on the server