Machine Learning
From linear regression to diffusion models β complete derivations, mathematical foundations, and Python implementations of every major algorithm.
The Machine Learning Landscape
The Equations That Define ML
Linear Regression (OLS)
\( \hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y} \)
Gradient Descent
\( \boldsymbol{\theta}_{t+1} = \boldsymbol{\theta}_t - \eta \nabla_{\boldsymbol{\theta}} \mathcal{L}(\boldsymbol{\theta}_t) \)
Bayesβ Theorem
\( P(\boldsymbol{\theta} \mid \mathcal{D}) = \frac{P(\mathcal{D} \mid \boldsymbol{\theta}) P(\boldsymbol{\theta})}{P(\mathcal{D})} \)
Cross-Entropy Loss
\( \mathcal{L} = -\sum_{i} y_i \log \hat{y}_i + (1-y_i)\log(1-\hat{y}_i) \)
Backpropagation
\( \frac{\partial \mathcal{L}}{\partial w_{ij}^{(l)}} = \frac{\partial \mathcal{L}}{\partial z_j^{(l)}} \cdot a_i^{(l-1)} \)
Attention (Transformer)
\( \text{Attention}(Q,K,V) = \text{softmax}\!\left(\frac{QK^\top}{\sqrt{d_k}}\right)V \)
ELBO (VAE)
\( \mathcal{L} = \mathbb{E}_{q}[\log p(\mathbf{x}|\mathbf{z})] - D_{\text{KL}}(q(\mathbf{z}|\mathbf{x}) \| p(\mathbf{z})) \)
Bellman Equation (RL)
\( V^*(s) = \max_a \left[ R(s,a) + \gamma \sum_{s'} P(s'|s,a) V^*(s') \right] \)
About This Course
This course teaches machine learning from the mathematics up. Every algorithm is derived from first principles: we start with the objective function, take derivatives, and arrive at the update rules. No black boxes. Every chapter includes complete MathJax derivations, SVG architecture diagrams, and Python simulations that you can run in the browser.
The course spans from classical methods (linear regression, SVMs) through the deep learning revolution (CNNs, RNNs, Transformers) to the research frontier (diffusion models, LLMs, graph neural networks). Part V on Probabilistic ML connects to the Bayesian brain framework in our Music & Mathematics course and to our Information Theory course.
Prerequisites: multivariable calculus, linear algebra, basic probability. Chapter 1β3 provide a thorough review of all necessary mathematics.
Course Structure
Mathematical Foundations
Linear algebra (vectors, matrices, eigendecomposition, SVD), probability theory (Bayesβ theorem, distributions, MLE, MAP), and optimization (gradient descent, convexity, Lagrange multipliers, KKT conditions).
Supervised Learning
Linear regression (OLS derivation, regularization, bias-variance tradeoff), logistic regression (cross-entropy, softmax, Newtonβs method), and SVMs (maximum margin, kernel trick, dual formulation).
Neural Networks
The perceptron, backpropagation derivation (chain rule through computational graphs), deep architectures (BatchNorm, dropout, residual connections), and CNNs (convolution theorem, pooling, modern architectures).
Unsupervised Learning
K-means and Gaussian mixture models (EM algorithm derivation), PCA (eigenvalue formulation, kernel PCA, t-SNE), autoencoders and VAEs (ELBO derivation, reparameterization trick).
Probabilistic ML
Bayesian inference (prior β posterior, conjugacy, MCMC), Gaussian processes (kernel functions, predictive distribution), and variational inference (ELBO, mean-field, amortized inference). Cross-links to the Bayesian brain in music perception.
Sequence Models
RNNs (BPTT derivation, LSTM/GRU gating), the attention mechanism (scaled dot-product, multi-head), Transformers (positional encoding, layer norm), and LLMs (GPT, BERT, scaling laws, RLHF).
Advanced Topics
Reinforcement learning (Bellman equation, policy gradient, PPO), graph neural networks (message passing, spectral convolution), and diffusion models (forward/reverse process, score matching, classifier-free guidance).
Recommended Textbooks
- Pattern Recognition and Machine Learning β Christopher Bishop (2006)
- The Elements of Statistical Learning β Hastie, Tibshirani & Friedman (2009)
- Deep Learning β Goodfellow, Bengio & Courville (2016)
- Mathematics for Machine Learning β Deisenroth, Faisal & Ong (2020)
- Probabilistic Machine Learning β Kevin Murphy (2022, 2023)
- Reinforcement Learning: An Introduction β Sutton & Barto (2018)