Information Theory

Part I: Foundations

The three chapters of Part I build the entire edifice of classical information theory from a single 1948 paper.

Shannon's 1948 Paper

In July 1948, Claude Shannon published “A Mathematical Theory of Communication” in the Bell System Technical Journal. In 55 pages Shannon accomplished something remarkable: he gave a precise, quantitative answer to the question “what is information?” and then derived the fundamental limits governing every communication system and every compression algorithm — limits that stand to this day.

Shannon's key insight was to separate the meaning of a message from its statistical structure. Information, in Shannon's framework, is purely a matter of probability: a message drawn from a distribution with probability \(p\) carries \(-\log_2 p\) bits of information, regardless of what it says. From this simple definition he derived two theorems of extraordinary depth: the source coding theorem (compression cannot beat entropy) and the noisy channel coding theorem (reliable communication is possible at any rate below channel capacity).

Part I develops these three pillars in order — entropy, source coding, channel capacity — building the mathematical scaffolding you need to appreciate every later development in the course.

Core Equations of Part I

Shannon Entropy

\( H(X) = -\sum_{i} p_i \log_2 p_i \)

Source Coding Bound

\( \bar{\ell} \geq H(X) \)

Channel Capacity

\( C = \max_{p(x)} I(X;Y) \)

Chapters

1

Chapter 1: Shannon Entropy

Information content of events, the entropy formula H(X) = −∑ pᵢ log₂ pᵢ, its axiomatic derivation, properties, joint and conditional entropy, and the chain rule.

  • Information content I(x) = −log₂ p(x)
  • Entropy from axioms
  • Properties: non-negativity, max, concavity
  • Joint & conditional entropy
  • Chain rule H(X,Y) = H(X) + H(Y|X)
2

Chapter 2: Source Coding Theorem

Shannon's fundamental theorem on lossless compression: no code can compress below H(X) bits/symbol, yet this bound is asymptotically achievable.

  • Kraft inequality
  • Shannon's source coding theorem
  • Typical sequences & AEP
  • Proof sketch
  • Preview of rate-distortion theory
3

Chapter 3: Channel Capacity

Mutual information, the capacity of noisy channels (BSC, BEC), and the noisy channel coding theorem: reliable communication is possible at any rate below capacity.

  • Mutual information I(X;Y)
  • Channel capacity C = max I(X;Y)
  • Binary symmetric channel
  • Binary erasure channel
  • Shannon's noisy channel coding theorem

Prerequisites

Part I requires only probability theory at the undergraduate level. You should be comfortable with:

  • Discrete probability distributions, expectation, and variance
  • Basic combinatorics and the binomial theorem
  • Logarithms (natural and base-2) and their algebraic properties
  • The definition of a limit and simple asymptotic notation