Chapter 6: The DFT & FFT Algorithms
The Discrete Fourier Transform bridges continuous-domain theory with practical computation, while the Fast Fourier Transform makes it computationally feasible. This chapter covers the DFT definition, the Cooley-Tukey radix-2 algorithm, spectral leakage, windowing, zero-padding, and power spectral density estimation.
6.1 The Discrete Fourier Transform
Given a finite-length discrete-time signal $x[n]$ of length $N$, the Discrete Fourier Transform (DFT) maps it to a set of $N$ frequency-domain coefficients. Unlike the DTFT, which produces a continuous frequency spectrum, the DFT samples the frequency axis at $N$ equally spaced points.
Definition — The DFT and IDFT
The **DFT** of a length-$N$ sequence $x[n]$ is:
$$X[k] = \sum_{n=0}^{N-1} x[n]\, e^{-j2\pi kn/N}, \quad k = 0, 1, \ldots, N-1$$
The **Inverse DFT (IDFT)** recovers the time-domain signal:
$$x[n] = \frac{1}{N} \sum_{k=0}^{N-1} X[k]\, e^{j2\pi kn/N}, \quad n = 0, 1, \ldots, N-1$$
The Twiddle Factor
The complex exponential kernel appears so frequently that it is given a special symbol, the twiddle factor:
$$W_N = e^{-j2\pi/N}$$
In this notation the DFT becomes $X[k] = \sum_{n=0}^{N-1} x[n]\, W_N^{kn}$. The twiddle factor satisfies important symmetry properties:
- Periodicity: $W_N^{k+N} = W_N^k$
- Symmetry: $W_N^{k+N/2} = -W_N^k$
- Recursion: $W_N^{2} = W_{N/2}$
Computational Complexity
Direct evaluation of the DFT requires $N$ multiplications and $N-1$ additions for each of the $N$ output bins, yielding an overall complexity of$\mathcal{O}(N^2)$. For a signal with $N = 10^6$ samples, this means$10^{12}$ complex multiplications — utterly impractical without the FFT.
Example — DFT of a Simple Sequence
Let $x = [1, 2, 3, 4]$ so $N = 4$ and $W_4 = e^{-j\pi/2} = -j$. Then:
- $X[0] = 1 + 2 + 3 + 4 = 10$
- $X[1] = 1 + 2(-j) + 3(-1) + 4(j) = -2 + 2j$
- $X[2] = 1 + 2(-1) + 3(1) + 4(-1) = -2$
- $X[3] = 1 + 2(j) + 3(-1) + 4(-j) = -2 - 2j$
Note: $X[3] = X[1]^*$ since $x[n]$ is real — the DFT of a real signal is conjugate-symmetric.
The DFT implicitly treats $x[n]$ as one period of a periodic sequence with period $N$. This "periodic extension" assumption is the root cause of many practical issues such as spectral leakage and circular (rather than linear) convolution.
6.2 The Cooley-Tukey FFT
The Fast Fourier Transform (FFT) is not a different transform — it is an efficient algorithm for computing the DFT. The most widely used variant is the Cooley-Tukey radix-2 decimation-in-time (DIT) algorithm, published by Cooley and Tukey in 1965 (though Gauss knew a version of it in 1805).
Divide and Conquer
Assume $N$ is a power of 2. Split $x[n]$ into even-indexed and odd-indexed subsequences:
$$X[k] = \underbrace{\sum_{m=0}^{N/2-1} x[2m]\, W_{N/2}^{mk}}_{E[k]} \;+\; W_N^k \underbrace{\sum_{m=0}^{N/2-1} x[2m+1]\, W_{N/2}^{mk}}_{O[k]}$$
Here $E[k]$ is the $N/2$-point DFT of the even samples and$O[k]$ is the $N/2$-point DFT of the odd samples. Using the symmetry $W_N^{k+N/2} = -W_N^k$, we obtain:
Theorem — Radix-2 DIT Butterfly
For $k = 0, 1, \ldots, N/2 - 1$:
$$X[k] = E[k] + W_N^k \cdot O[k]$$
$$X[k + N/2] = E[k] - W_N^k \cdot O[k]$$
Each **butterfly** requires one complex multiplication and two complex additions.
Complexity Analysis
The recursion splits the $N$-point DFT into two $N/2$-point DFTs plus $N/2$ butterfly operations. With $\log_2 N$ stages of splitting:
$$T(N) = 2\,T(N/2) + \mathcal{O}(N) \;\;\Longrightarrow\;\; T(N) = \mathcal{O}(N \log N)$$
For $N = 10^6$, this is roughly $2 \times 10^7$ operations instead of$10^{12}$ — a speedup of $50{,}000\times$.
Bit-Reversal Permutation
The DIT algorithm requires the input to be rearranged in bit-reversed order. For $N = 8$, the index mapping is:
| Original index | Binary | Reversed | New index |
|---|---|---|---|
| 0 | 000 | 000 | 0 |
| 1 | 001 | 100 | 4 |
| 2 | 010 | 010 | 2 |
| 3 | 011 | 110 | 6 |
| 4 | 100 | 001 | 1 |
| 5 | 101 | 101 | 5 |
| 6 | 110 | 011 | 3 |
| 7 | 111 | 111 | 7 |
In practice, most FFT libraries (NumPy, FFTW) handle the bit-reversal internally. There also exist **decimation-in-frequency (DIF)** algorithms that reverse the output instead of the input, and mixed-radix algorithms for non-power-of-2 sizes.
DFT vs FFT Speed Comparison
Empirically compare the O(N^2) naive DFT against NumPy's O(N log N) FFT. Watch the speedup grow as N increases.
Click Run to execute the Python code
First run will download Python environment (~15MB)
6.3 DFT Properties
Many properties of the DTFT carry over to the DFT, but with the critical distinction that all operations are circular (modulo $N$) rather than linear.
Theorem — Linearity
If $x_1[n] \xleftrightarrow{\text{DFT}} X_1[k]$ and $x_2[n] \xleftrightarrow{\text{DFT}} X_2[k]$, then:
$$\alpha\, x_1[n] + \beta\, x_2[n] \;\xleftrightarrow{\text{DFT}}\; \alpha\, X_1[k] + \beta\, X_2[k]$$
Theorem — Circular Shift
A circular time shift by $m$ samples corresponds to a linear phase in frequency:
$$x[(n - m) \bmod N] \;\xleftrightarrow{\text{DFT}}\; W_N^{mk}\, X[k]$$
Similarly, a circular frequency shift gives: $W_N^{-ln}\, x[n] \;\xleftrightarrow{\text{DFT}}\; X[(k - l) \bmod N]$.
Theorem — Circular Convolution
The DFT of a pointwise product in one domain is circular convolution in the other:
$$x_1[n] \circledast x_2[n] = \sum_{m=0}^{N-1} x_1[m]\, x_2[(n-m) \bmod N] \;\xleftrightarrow{\text{DFT}}\; X_1[k] \cdot X_2[k]$$
To compute **linear** convolution via the DFT, zero-pad both sequences to length $N \geq N_1 + N_2 - 1$ before transforming.
Theorem — Parseval's Theorem for the DFT
The total energy in the time domain equals the total energy in the frequency domain (up to $1/N$):
$$\sum_{n=0}^{N-1} |x[n]|^2 = \frac{1}{N} \sum_{k=0}^{N-1} |X[k]|^2$$
Example — Linear Convolution via DFT
Given $h = [1, 1, 1]$ (length $N_1 = 3$) and $x = [1, 2, 3, 4]$ (length $N_2 = 4$): linear convolution has length $N_1 + N_2 - 1 = 6$.
- Zero-pad both to length $N = 8$ (next power of 2 $\geq 6$)
- Compute $H = \text{FFT}(h_{\text{padded}})$ and $X = \text{FFT}(x_{\text{padded}})$
- $Y = H \odot X$ (element-wise product)
- $y = \text{IFFT}(Y) = [1, 3, 6, 9, 9, 7, 4, 0]$ — trim to first 6 values
This "overlap-add" or "overlap-save" technique is the basis of fast convolution in every modern DSP system, from audio plugins to radar signal processors. The cost of two FFTs + pointwise multiply + one IFFT is $\mathcal{O}(N \log N)$, far better than $\mathcal{O}(N^2)$ direct convolution for large $N$.
6.4 Spectral Leakage & Windowing
Why Does Leakage Happen?
Computing the DFT of a finite-length signal is equivalent to multiplying an infinite signal by a rectangular window, then taking the DFT. In the frequency domain, this multiplication becomes convolution with the Dirichlet kernel (the DFT of the rectangular window):
$$W_{\text{rect}}(\omega) = e^{-j\omega(N-1)/2} \cdot \frac{\sin(N\omega/2)}{\sin(\omega/2)}$$
The main lobe has width $4\pi/N$, and the sidelobes decay slowly at only$\sim 1/k$. When a signal's frequency does not land exactly on a DFT bin, energy "leaks" into all bins through these sidelobes.
Window Functions
Applying a tapered window before the DFT trades a wider main lobe (reduced frequency resolution) for much lower sidelobes (less leakage). The five most common windows:
Definition — Common Window Functions
- Hann (Hanning):$w[n] = 0.5 - 0.5\cos\!\left(\frac{2\pi n}{N-1}\right)$
Sidelobe level: -31.5 dB, main lobe width: 8 pi/N - Hamming:$w[n] = 0.54 - 0.46\cos\!\left(\frac{2\pi n}{N-1}\right)$
Sidelobe level: -42.7 dB, main lobe width: 8 pi/N - Blackman:$w[n] = 0.42 - 0.5\cos\!\left(\frac{2\pi n}{N-1}\right) + 0.08\cos\!\left(\frac{4\pi n}{N-1}\right)$
Sidelobe level: -58.1 dB, main lobe width: 12 pi/N - Bartlett (triangular):$w[n] = 1 - \left|\frac{2n - (N-1)}{N-1}\right|$
Sidelobe level: -26.5 dB, main lobe width: 8 pi/N - Nuttall (4-term):$w[n] = a_0 - a_1\cos\!\left(\frac{2\pi n}{N-1}\right) + a_2\cos\!\left(\frac{4\pi n}{N-1}\right) - a_3\cos\!\left(\frac{6\pi n}{N-1}\right)$
Sidelobe level: -93.3 dB, excellent for dynamic range
Resolution vs. Leakage Trade-off
There is a fundamental trade-off: a narrower main lobe gives better frequency resolution but higher sidelobes (more leakage), and vice versa. The rectangular window has the narrowest main lobe ($4\pi/N$) but the worst sidelobes ($-13$ dB). The Nuttall window has sidelobes at $-93$ dB but a main lobe 4x wider.
**Rule of thumb:** Use Hann for general-purpose spectral analysis. Use Blackman or Nuttall when you need to detect weak signals near strong ones (high dynamic range). Use rectangular only when the signal is guaranteed to be exactly periodic in the window.
Spectral Leakage and Windowing
Visualize how different window functions suppress spectral leakage for a sinusoid whose frequency doesn't land on a DFT bin.
Click Run to execute the Python code
First run will download Python environment (~15MB)
6.5 Zero-Padding
Zero-padding means appending zeros to a signal before computing the DFT. If $x[n]$ has $N$ samples and we compute an $M$-point DFT with $M > N$:
$$X_M[k] = \sum_{n=0}^{N-1} x[n]\, e^{-j2\pi kn/M}, \quad k = 0, \ldots, M-1$$
This evaluates the DTFT $X(e^{j\omega})$ at $M$ equally spaced frequencies instead of $N$. The result is a finer sampling of the same underlying continuous spectrum — the spectrum looks smoother, but no new information is added.
Theorem — Zero-Padding Interpolation
Zero-padding from $N$ to $M = \alpha N$ points interpolates the DTFT at $M$ points:
$$X_M[k] = X\!\left(e^{j2\pi k/M}\right), \quad k = 0, \ldots, M-1$$
The **frequency resolution** is still $\Delta f = f_s / N$ (determined by the original data length), NOT $f_s / M$.
Example — When Zero-Padding Helps
Zero-padding is genuinely useful for:
- Making the DFT length a power of 2 for efficient FFT computation
- Producing smoother-looking spectral plots for visualization
- Enabling fast linear convolution (pad to $N_1 + N_2 - 1$)
- Improving peak location estimation by interpolating between original bins
**Common misconception:** Zero-padding does NOT improve spectral resolution. Two sinusoids separated by less than $\Delta f = f_s/N$ will not be resolved regardless of how much padding is applied. Only collecting more actual data (increasing $N$) improves resolution.
Zero-Padding: Interpolation, Not Resolution
Demonstrate that zero-padding smooths the spectrum but cannot resolve two closely-spaced tones. Only more data improves true resolution.
Click Run to execute the Python code
First run will download Python environment (~15MB)
6.6 Welch Periodogram
The raw periodogram $\hat{S}(\omega) = \frac{1}{N}|X(\omega)|^2$ is the simplest power spectral density (PSD) estimator, but it is inconsistent: its variance does not decrease as $N \to \infty$. Welch's method (1967) fixes this by averaging periodograms of overlapping windowed segments.
Definition — Welch's Method
- Divide the signal of length $N$ into $K$ overlapping segments of length $L$, with overlap $D$ samples.
- Apply a window function $w[n]$ to each segment $x_i[n]$.
- Compute the periodogram of each windowed segment: $P_i[k] = \frac{1}{L\,U}|X_i[k]|^2$, where $U = \frac{1}{L}\sum_{n=0}^{L-1} |w[n]|^2$ is the window normalization.
- Average the $K$ periodograms: $\hat{S}_{\text{Welch}}[k] = \frac{1}{K}\sum_{i=1}^{K} P_i[k]$.
Variance Reduction
For $K$ independent segments, the variance of the Welch estimator is reduced by a factor of approximately $1/K$. With 50% overlap and a Hann window, the effective number of independent segments is about $K_{\text{eff}} \approx 0.89\,K$ due to partial correlation between overlapping segments.
Theorem — Bias-Variance Trade-off in PSD Estimation
Shorter segments $\Rightarrow$ more averaging $\Rightarrow$ lower variance, but:
$$\Delta f = \frac{f_s}{L}$$
so shorter segments give **worse frequency resolution**. This is a fundamental trade-off: you cannot simultaneously have fine frequency resolution and low estimation variance.
Example — Typical Welch Parameters
For a signal sampled at $f_s = 44100$ Hz with $N = 441000$ samples (10 seconds):
- Segment length $L = 4096$ → $\Delta f = 10.8$ Hz
- 50% overlap: $D = 2048$
- Number of segments: $K = \lfloor(N - L)/(L - D)\rfloor + 1 = 214$
- Variance reduction: $\sim 214\times$ compared to raw periodogram
Welch PSD Estimation
Compare the noisy raw periodogram against the smooth Welch PSD estimate for a signal with two sinusoids buried in noise.
Click Run to execute the Python code
First run will download Python environment (~15MB)
6.7 Practical Tips
Choosing N
The DFT length $N$ determines the frequency resolution:
$$\Delta f = \frac{f_s}{N}$$
To resolve two tones separated by $\Delta f_{\min}$ Hz, you need at least $N \geq f_s / \Delta f_{\min}$ samples. With a window applied, the required $N$ increases by the window's main-lobe-width factor (e.g., 2x for Hann).
Padding to Power of 2
The radix-2 FFT requires $N = 2^m$. If your data length is not a power of 2, zero-pad to the next power of 2. Most FFT libraries also support mixed-radix transforms for composite sizes, but powers of 2 remain the fastest:
$$N_{\text{fft}} = 2^{\lceil \log_2 N \rceil}$$
Example — Quick Reference Table
| Parameter | Formula | Notes |
|---|---|---|
| Freq. resolution | $\Delta f = f_s / N$ | Fundamental limit |
| Max frequency | $f_{\max} = f_s / 2$ | Nyquist limit |
| Bin spacing | $f_k = k \cdot f_s / N$ | $k = 0, \ldots, N/2$ |
| Record length | $T = N / f_s$ | Observation time |
| FFT complexity | $\frac{N}{2}\log_2 N$ | Complex multiplies |
Common Pitfalls
- Forgetting the periodic assumption: The DFT treats the input as periodic with period $N$. Discontinuities at the boundaries cause leakage — always apply a window.
- Confusing DFT bins with physical frequency: Bin $k$ corresponds to frequency $f_k = k \cdot f_s / N$ Hz. The upper half of the DFT ($k > N/2$) represents negative frequencies.
- Not normalizing the FFT output: For amplitude spectra, divide $|X[k]|$ by $N$ (or $N/2$ for one-sided spectra). For PSD, use $|X[k]|^2 / (f_s \cdot N)$.
- Assuming zero-padding improves resolution: It does not. Only more data or parametric methods can do that.
Frequency Resolution Summary
Bringing it all together, the achievable frequency resolution depends on three factors:
- Data length: $\Delta f = f_s / N$ — the fundamental limit.
- Window function: Widens the main lobe by a factor of 1.5x (Hann) to 4x (Nuttall), effectively degrading resolution.
- SNR: In practice, noise limits resolution more than the theoretical $\Delta f$ — use Welch averaging to improve PSD estimates.
**Summary of Chapter 6:** The DFT maps $N$ time-domain samples to $N$ frequency-domain bins with $\mathcal{O}(N^2)$ complexity. The FFT reduces this to $\mathcal{O}(N \log N)$ via the butterfly decomposition. Spectral leakage is mitigated by windowing (at the cost of resolution), zero-padding provides spectral interpolation (not resolution), and Welch's method gives reliable PSD estimates by averaging overlapping segments.