Machine Learning

Part IV: Unsupervised Learning

Unsupervised learning uncovers structure in data without labels. This part covers the three pillars of the field: clustering (finding groups), dimensionality reduction (finding compact representations), and generative modelling (learning the data distribution itself). Every algorithm is derived from first principles with full mathematical rigour.

What you will learn

Derive the K-means update rule from the objective function
Prove that Lloyd's algorithm converges monotonically
Implement EM for Gaussian Mixture Models from scratch
Derive PCA from variance maximisation using Lagrange multipliers
Understand why principal components are eigenvectors of the covariance
Explain the t-SNE crowding problem and the Student-t kernel solution
Derive the VAE ELBO from log-likelihood and the reparameterisation trick
Compute the KL divergence between two Gaussians in closed form

Prerequisites

Parts I–III. You should be comfortable with matrix eigendecomposition (Part I), maximum likelihood estimation and Bayes' theorem (Part I), and the concept of gradient descent (Part I). Familiarity with the Gaussian distribution is essential for Chapters 10 and 12.