Information Theory

Part I: Foundations

The three chapters of Part I build the entire edifice of classical information theory from a single 1948 paper.

Shannon's 1948 Paper

In July 1948, Claude Shannon published “A Mathematical Theory of Communication” in the Bell System Technical Journal. In 55 pages Shannon accomplished something remarkable: he gave a precise, quantitative answer to the question “what is information?” and then derived the fundamental limits governing every communication system and every compression algorithm — limits that stand to this day.

Shannon's key insight was to separate the meaning of a message from its statistical structure. Information, in Shannon's framework, is purely a matter of probability: a message drawn from a distribution with probability \(p\) carries \(-\log_2 p\) bits of information, regardless of what it says. From this simple definition he derived two theorems of extraordinary depth: the source coding theorem (compression cannot beat entropy) and the noisy channel coding theorem (reliable communication is possible at any rate below channel capacity).

Part I develops these three pillars in order — entropy, source coding, channel capacity — building the mathematical scaffolding you need to appreciate every later development in the course.

Core Equations of Part I

Shannon Entropy

\( H(X) = -\sum_{i} p_i \log_2 p_i \)

Source Coding Bound

\( \bar{\ell} \geq H(X) \)

Channel Capacity

\( C = \max_{p(x)} I(X;Y) \)

Chapters

Chapter 1: Shannon Entropy →

Information content of events, the entropy formula H(X) = −∑ pᵢ log₂ pᵢ, its axiomatic derivation, properties, joint and conditional entropy, and the chain rule.

Information content I(x) = −log₂ p(x)
Entropy from axioms
Properties: non-negativity, max, concavity
Joint & conditional entropy
Chain rule H(X,Y) = H(X) + H(Y|X)

Chapter 2: Source Coding Theorem →

Shannon's fundamental theorem on lossless compression: no code can compress below H(X) bits/symbol, yet this bound is asymptotically achievable.

Kraft inequality
Shannon's source coding theorem
Typical sequences & AEP
Proof sketch
Preview of rate-distortion theory

Chapter 3: Channel Capacity →

Mutual information, the capacity of noisy channels (BSC, BEC), and the noisy channel coding theorem: reliable communication is possible at any rate below capacity.

Mutual information I(X;Y)
Channel capacity C = max I(X;Y)
Binary symmetric channel
Binary erasure channel
Shannon's noisy channel coding theorem

Prerequisites

Part I requires only probability theory at the undergraduate level. You should be comfortable with:

Discrete probability distributions, expectation, and variance
Basic combinatorics and the binomial theorem
Logarithms (natural and base-2) and their algebraic properties
The definition of a limit and simple asymptotic notation

← Course Overview Start Chapter 1: Shannon Entropy →

Share:X Reddit LinkedIn