Kolmogorov Complexity
The information content of an individual string β defined as the length of its shortest description. Uncomputable, yet the foundation of algorithmic information theory.
10.1 Definition
Fix a universal Turing machine \(U\). The Kolmogorov complexity (also called algorithmic or descriptional complexity) of a binary string \(x\) is:
\[ K(x) = \min\bigl\{|p| : U(p) = x\bigr\} \]
length of shortest binary program that makes \(U\) output \(x\)
Intuitively: how many bits are needed to describe \(x\) completely? A string like \(\underbrace{00\cdots0}_{1000}\) can be described by a short program (βprint 1000 zerosβ), so \(K(x) \ll |x|\). A genuinely random string has no shorter description than itself, so \(K(x) \approx |x|\).
Prefix-free version: One often uses the prefix-free Kolmogorov complexity \(K^*(x)\) where only self-delimiting programs are allowed (analogous to prefix-free codes). This satisfies the Kraft inequality\(\sum_x 2^{-K^*(x)} \le 1\), making it closer to a probability measure.
10.2 Invariance Theorem
Does \(K(x)\) depend on the choice of reference machine \(U\)? The Invariance Theorem says no β up to a constant:
\[ K_U(x) \le K_V(x) + c_{U,V} \]
where \(c_{U,V}\) is a constant depending only on the two machines, not on \(x\)
Proof: Given a program \(p_V\) that computes\(x\) on \(V\), we can prepend a fixed interpreter \(I_{V\to U}\)that simulates \(V\) on \(U\): the combined program \(I_{V\to U} p_V\)runs on \(U\) and has length \(|I_{V\to U}| + |p_V|\) where\(|I_{V\to U}|\) is the fixed constant \(c_{U,V}\).
The invariance theorem justifies speaking of the Kolmogorov complexity β it is well-defined up to a fixed additive constant. For large strings, this constant is negligible.
10.3 Uncomputability
Theorem: There is no algorithm that computes\(K(x)\) for all \(x\).
Proof (Berry paradox variant): Suppose\(K\) were computable. Consider the string:
βthe lexicographically first string \(x\) with \(K(x) > n\)β
This description has length \(O(\log n)\) bits (encoding \(n\)) β but by definition the string has \(K > n\). For large enough \(n\),\(O(\log n) < n\), a contradiction. Hence \(K\) is uncomputable.
Practically: we can approximate \(K(x)\) from above by compression algorithms (zlib, bz2, lzma). These give upper bounds: if a compressor produces a \(k\)-byte output, then \(K(x) \le k + c\) where \(c\) is the decompressor size.
10.4 Relation to Shannon Entropy
For a random variable \(X\) drawn i.i.d. with distribution \(P\):
\[ \mathbb{E}[K(X)] = H(X) + O(1) \]
More precisely, for a sequence of \(n\) i.i.d. samples:
\[ \frac{1}{n}\mathbb{E}[K(X^n)] \xrightarrow{n\to\infty} H(X) \]
Shannon Entropy
- β£ Defined for probability distributions
- β£ Average over many outcomes
- β£ Computable
- β£ Distribution-dependent
Kolmogorov Complexity
- β£ Defined for individual strings
- β£ Complexity of a single object
- β£ Uncomputable (only approximable)
- β£ Distribution-free / objective
10.5 Random Strings
A string \(x\) of length \(n\) is called algorithmically random (orKolmogorov random) if \(K(x) \ge n - O(1)\) β there is no significantly shorter description.
Key facts:
- β£At least \(2^n - 2^{n-c+1}\) strings of length \(n\) have\(K(x) \ge n-c\). Almost all strings are random.
- β£Random strings have no exploitable pattern: they pass all effective statistical tests. Martin-LΓΆf randomness coincides with Kolmogorov randomness.
- β£You can never prove a specific string is Kolmogorov random (uncomputability), yet almost all strings are. This is analogous to GΓΆdel incompleteness.
Python: Estimating Complexity via Compression
Uses zlib, bz2, and lzma as upper-bound estimators for \(K(x)\). Tests 8 string types (zeros, alternating, Fibonacci word, powers of 2, English text, source code, random bits, random DNA) of length 1000. Four plots: compressed sizes, compression ratios, K vs length for random/structured, and distributions of K over 500 strings of each type.
Click Run to execute the Python code
Code will be executed with Python 3 on the server