Part III › Chapter 7

Differential Entropy

Extending Shannon entropy to continuous random variables — with the surprising twist that the entropy can be negative, and that the Gaussian is uniquely entropy-maximizing.

7.1 Definition

For a continuous random variable $X$ with probability density function$f(x)$, the differential entropy is:

\[ h(X) = -\int_{-\infty}^{\infty} f(x)\log f(x)\,dx \]

The integral is taken over the support of $f$. Like discrete entropy, it measures “spread” or “uncertainty” of a distribution — but with a critical difference: differential entropy can be negative, and its value depends on the units of measurement.

The non-negativity of discrete entropy relied on $0 \le p_i \le 1$ implying$-\log p_i \ge 0$. For a continuous pdf, $f(x)$ can exceed 1, making $-\log f(x)$ negative, so $h(X)$ can be negative.

7.2 Closed-Form Results

Distribution	Parameters	Differential Entropy h(X)
Uniform	$[a,b]$	$\log(b-a)$
Gaussian	$\mathcal{N}(\mu,\sigma^2)$	$\frac{1}{2}\log(2\pi e\,\sigma^2)$
Exponential	$\text{Exp}(\lambda)$	$1 - \log\lambda$
Laplace	$\text{Lap}(\mu,b)$	$1 + \log(2b)$
Multivariate Gaussian	$\mathcal{N}(\boldsymbol\mu,\Sigma)$	$\frac{n}{2}\log(2\pi e) + \frac{1}{2}\log\det\Sigma$

7.3 Maximum Entropy Distributions

Subject to a variance constraint$\operatorname{Var}(X)=\sigma^2$, the Gaussian maximizes differential entropy:

\[ h(X) \;\le\; \tfrac{1}{2}\log(2\pi e\,\sigma^2) \]

with equality iff $X \sim \mathcal{N}(0,\sigma^2)$

Proof sketch (relative entropy): For any $f$ with variance $\sigma^2$, let $g=\mathcal{N}(0,\sigma^2)$. By non-negativity of KL divergence:

\[ 0 \;\le\; D_{\mathrm{KL}}(f\|g) = -h(X) - \int f\log g\,dx \]

Since $\log g(x) = -\frac{x^2}{2\sigma^2} - \frac{1}{2}\log(2\pi\sigma^2)$, the integral $\int f\log g\,dx$ depends only on the second moment of $f$, which equals $\sigma^2$ by assumption. Therefore $h(X) \le h(g)$.

This result is why Gaussian noise is the worst case for channel capacity problems — it is the “most uncertain” noise for a given power level.

Other Maximum Entropy Results

‣Support [a, b] constraint: Uniform distribution maximizes $h(X)$
‣Mean constraint $E[X]=\mu$, support $[0,\infty)$: Exponential distribution
‣No constraint (only normalization): $h(X)\to\infty$ — no maximum exists

7.4 Properties of Differential Entropy

Scaling

\[ h(aX) = h(X) + \log|a| \]

Stretching a distribution increases its entropy logarithmically

Translation Invariance

\[ h(X + c) = h(X) \]

Shifting does not change the shape of the distribution

Chain Rule

\[ h(X,Y) = h(X) + h(Y|X) \]

Same form as discrete; conditioning reduces entropy

Mutual Information

\[ I(X;Y) = h(X) - h(X|Y) \ge 0 \]

Always non-negative even though h can be negative

Relation to Discrete Entropy

Quantizing $X$ to bins of width $\Delta$ gives a discrete variable $X^\Delta$ with:

\[ H(X^\Delta) \approx h(X) - \log\Delta \]

As $\Delta\to 0$ (finer quantization), $H(X^\Delta)\to\infty$. The divergence rate is $-\log\Delta$ and the “remainder” is $h(X)$. This is why $h(X)$ can be negative but $H$ is always non-negative.

Python: Computing & Visualizing Differential Entropy

Compare analytical formulas against numerical estimates from histograms. Six panels: entropy vs parameter, PDFs, bar comparison, Gaussian maximality, negativity region, and the discrete-to-continuous relation.

Python

script.py185 lines

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

plt.style.use('dark_background')

fig, axes = plt.subplots(2, 3, figsize=(14, 9))
fig.patch.set_facecolor('#0a0a0f')
for ax in axes.flat:
    ax.set_facecolor('#0d0d1a')

# ── 1. Analytical differential entropies ──────────────────────────────────
# Gaussian: h = 0.5 * log(2*pi*e*sigma^2)
# Uniform [0,a]: h = log(a)
# Exponential(lambda): h = 1 - log(lambda)

sigmas = np.linspace(0.1, 5, 300)
h_gauss = 0.5 * np.log2(2 * np.pi * np.e * sigmas**2)

a_vals = np.linspace(0.1, 5, 300)
h_uniform = np.log2(a_vals)

lam_vals = np.linspace(0.1, 5, 300)
h_exp = (1 - np.log(lam_vals)) / np.log(2)

ax = axes[0, 0]
ax.plot(sigmas, h_gauss, color='#818cf8', lw=2, label=r'Gaussian  $\frac{1}{2}\log(2\pi e\sigma^2)$')
ax.plot(a_vals, h_uniform, color='#34d399', lw=2, label=r'Uniform[0,a]  $\log a$')
ax.plot(lam_vals, h_exp, color='#f59e0b', lw=2, label=r'Exp($\lambda$)  $1-\log\lambda$')
ax.axhline(0, color='#6b7280', lw=0.8, ls='--', alpha=0.6)
ax.set_xlabel('Parameter', color='#9ca3af')
ax.set_ylabel('h(X)  [bits]', color='#9ca3af')
ax.set_title('Analytical Differential Entropy', color='#a5b4fc', fontweight='bold')
ax.legend(fontsize=8, facecolor='#0d0d1a', edgecolor='#374151', labelcolor='white')
ax.tick_params(colors='#6b7280')
for spine in ax.spines.values(): spine.set_edgecolor('#1e293b')

# ── 2. PDFs of three distributions ────────────────────────────────────────
x = np.linspace(-4, 8, 500)
sigma_g = 1.5
a_u = 4.0
lam_e = 0.8

pdf_gauss = stats.norm.pdf(x, 0, sigma_g)
pdf_uniform = np.where((x >= 0) & (x <= a_u), 1.0 / a_u, 0.0)
pdf_exp = stats.expon.pdf(x, scale=1.0/lam_e)

ax = axes[0, 1]
ax.plot(x, pdf_gauss, color='#818cf8', lw=2,
        label=f'Gaussian σ={sigma_g}  h={0.5*np.log2(2*np.pi*np.e*sigma_g**2):.2f} bits')
ax.plot(x, pdf_uniform, color='#34d399', lw=2,
        label=f'Uniform[0,{a_u}]  h={np.log2(a_u):.2f} bits')
ax.plot(x, pdf_exp, color='#f59e0b', lw=2,
        label=f'Exp(λ={lam_e})  h={(1-np.log(lam_e))/np.log(2):.2f} bits')
ax.set_xlabel('x', color='#9ca3af')
ax.set_ylabel('f(x)', color='#9ca3af')
ax.set_title('Probability Density Functions', color='#a5b4fc', fontweight='bold')
ax.legend(fontsize=8, facecolor='#0d0d1a', edgecolor='#374151', labelcolor='white')
ax.tick_params(colors='#6b7280')
for spine in ax.spines.values(): spine.set_edgecolor('#1e293b')

# ── 3. Numerical entropy via histogram vs analytical ───────────────────────
rng = np.random.default_rng(42)
n_samples = 200_000

def numerical_entropy(samples, n_bins=200):
    counts, edges = np.histogram(samples, bins=n_bins, density=True)
    dx = edges[1] - edges[0]
    probs = counts * dx
    mask = probs > 0
    return -np.sum(probs[mask] * np.log2(probs[mask] + 1e-300))

gauss_samples = rng.normal(0, sigma_g, n_samples)
uniform_samples = rng.uniform(0, a_u, n_samples)
exp_samples = rng.exponential(1.0/lam_e, n_samples)

h_gauss_num = numerical_entropy(gauss_samples)
h_unif_num  = numerical_entropy(uniform_samples)
h_exp_num   = numerical_entropy(exp_samples)

h_gauss_ana = 0.5 * np.log2(2*np.pi*np.e*sigma_g**2)
h_unif_ana  = np.log2(a_u)
h_exp_ana   = (1 - np.log(lam_e)) / np.log(2)

ax = axes[0, 2]
labels = ['Gaussian', 'Uniform', 'Exponential']
ana = [h_gauss_ana, h_unif_ana, h_exp_ana]
num = [h_gauss_num, h_unif_num, h_exp_num]
x_pos = np.arange(3)
bars1 = ax.bar(x_pos - 0.2, ana, 0.38, label='Analytical', color='#6366f1', alpha=0.85)
bars2 = ax.bar(x_pos + 0.2, num, 0.38, label='Numerical (histogram)', color='#f59e0b', alpha=0.85)
ax.set_xticks(x_pos); ax.set_xticklabels(labels, color='#9ca3af', fontsize=9)
ax.set_ylabel('h(X)  [bits]', color='#9ca3af')
ax.set_title('Analytical vs Numerical Entropy', color='#a5b4fc', fontweight='bold')
ax.legend(fontsize=8, facecolor='#0d0d1a', edgecolor='#374151', labelcolor='white')
ax.tick_params(colors='#6b7280')
for spine in ax.spines.values(): spine.set_edgecolor('#1e293b')
for bar, val in zip(bars1, ana):
    ax.text(bar.get_x()+bar.get_width()/2, bar.get_height()+0.03, f'{val:.2f}',
            ha='center', va='bottom', color='#818cf8', fontsize=7)
for bar, val in zip(bars2, num):
    ax.text(bar.get_x()+bar.get_width()/2, bar.get_height()+0.03, f'{val:.2f}',
            ha='center', va='bottom', color='#fbbf24', fontsize=7)

# ── 4. Gaussian maximizes entropy (fixed variance) ─────────────────────────
ax = axes[1, 0]
sigma_vals = np.linspace(0.2, 3, 200)
h_max_gaussian = 0.5 * np.log2(2 * np.pi * np.e * sigma_vals**2)

# Any other distribution with same variance has h ≤ h_gaussian
# Show Laplace with same variance: h = 1 + log(sigma/sqrt(2))
h_laplace = (1 + np.log(sigma_vals / np.sqrt(2))) / np.log(2)

ax.fill_between(sigma_vals, h_laplace, h_max_gaussian,
                alpha=0.25, color='#818cf8', label='Gap: Gaussian advantage')
ax.plot(sigma_vals, h_max_gaussian, color='#818cf8', lw=2.5,
        label='Gaussian (maximum)')
ax.plot(sigma_vals, h_laplace, color='#f87171', lw=2, ls='--',
        label='Laplace (same variance)')
ax.set_xlabel('σ (standard deviation)', color='#9ca3af')
ax.set_ylabel('h(X)  [bits]', color='#9ca3af')
ax.set_title('Max-Entropy: Gaussian Wins', color='#a5b4fc', fontweight='bold')
ax.legend(fontsize=8, facecolor='#0d0d1a', edgecolor='#374151', labelcolor='white')
ax.tick_params(colors='#6b7280')
for spine in ax.spines.values(): spine.set_edgecolor('#1e293b')

# ── 5. h(X) can be negative (tight support) ───────────────────────────────
ax = axes[1, 1]
a_vals2 = np.linspace(0.01, 3, 400)
h_narrow = np.log2(a_vals2)  # Uniform[0,a]: negative when a < 1

ax.plot(a_vals2, h_narrow, color='#34d399', lw=2.5)
ax.axhline(0, color='#f87171', lw=1.5, ls='--', label='h = 0  (a = 1)')
ax.axvline(1, color='#f87171', lw=1.5, ls='--')
ax.fill_between(a_vals2, h_narrow, 0,
                where=(a_vals2 < 1), alpha=0.3, color='#f87171', label='h < 0 region')
ax.fill_between(a_vals2, h_narrow, 0,
                where=(a_vals2 >= 1), alpha=0.2, color='#34d399', label='h > 0 region')
ax.set_xlabel('Support width  a', color='#9ca3af')
ax.set_ylabel('h(X) = log₂(a)  [bits]', color='#9ca3af')
ax.set_title('Differential Entropy Can Be Negative!', color='#a5b4fc', fontweight='bold')
ax.legend(fontsize=8, facecolor='#0d0d1a', edgecolor='#374151', labelcolor='white')
ax.tick_params(colors='#6b7280')
for spine in ax.spines.values(): spine.set_edgecolor('#1e293b')
ax.text(0.3, -2.5, 'Very concentrated\n→ h < 0', color='#f87171', fontsize=8, ha='center')
ax.text(2.0, 0.8, 'Spread out\n→ h > 0', color='#34d399', fontsize=8, ha='center')

# ── 6. Relation discrete ↔ continuous entropy ─────────────────────────────
ax = axes[1, 2]
delta_vals = np.logspace(-2, 1, 300)
sigma_fixed = 1.0
H_discrete_approx = 0.5*np.log2(2*np.pi*np.e*sigma_fixed**2) - np.log2(delta_vals)
h_cont = 0.5*np.log2(2*np.pi*np.e*sigma_fixed**2) * np.ones_like(delta_vals)

ax.semilogx(delta_vals, H_discrete_approx, color='#818cf8', lw=2.5,
            label=r'$H_Delta approx h(X) - log_2Delta$')
ax.semilogx(delta_vals, h_cont, color='#34d399', lw=1.5, ls='--',
            label=f'h(X) = {h_cont[0]:.2f} bits  (Gaussian σ=1)')
ax.set_xlabel('Quantization step Δ', color='#9ca3af')
ax.set_ylabel('Bits', color='#9ca3af')
ax.set_title('Discretization: H ≈ h(X) − log₂Δ', color='#a5b4fc', fontweight='bold')
ax.legend(fontsize=8, facecolor='#0d0d1a', edgecolor='#374151', labelcolor='white')
ax.tick_params(colors='#6b7280')
for spine in ax.spines.values(): spine.set_edgecolor('#1e293b')

fig.suptitle('Chapter 7: Differential Entropy', color='#a5b4fc',
             fontsize=15, fontweight='bold', y=1.01)
plt.tight_layout()
plt.savefig('output.png', dpi=130, bbox_inches='tight',
            facecolor='#0a0a0f', edgecolor='none')
print("Saved output.png")
print(f"\nAnalytical entropies (bits):")
print(f"  Gaussian  σ={sigma_g}: h = {h_gauss_ana:.4f}")
print(f"  Uniform   a={a_u}:  h = {h_unif_ana:.4f}")
print(f"  Exponential λ={lam_e}: h = {h_exp_ana:.4f}")
print(f"\nNumerical (histogram) entropies:")
print(f"  Gaussian:     {h_gauss_num:.4f}")
print(f"  Uniform:      {h_unif_num:.4f}")
print(f"  Exponential:  {h_exp_num:.4f}")
print(f"\nNote: Uniform[0,0.5] entropy = {np.log2(0.5):.3f} bits (negative!)")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

← Turbo & LDPC Next: Gaussian Channels →

Share:X Reddit LinkedIn

Distribution	Parameters	Differential Entropy h(X)
Uniform	\([a,b]\)	\(\log(b-a)\)
Gaussian	\(\mathcal{N}(\mu,\sigma^2)\)	\(\frac{1}{2}\log(2\pi e\,\sigma^2)\)
Exponential	\(\text{Exp}(\lambda)\)	\(1 - \log\lambda\)
Laplace	\(\text{Lap}(\mu,b)\)	\(1 + \log(2b)\)
Multivariate Gaussian	\(\mathcal{N}(\boldsymbol\mu,\Sigma)\)	\(\frac{n}{2}\log(2\pi e) + \frac{1}{2}\log\det\Sigma\)