Machine Learning
Part I: Mathematical Foundations
Every machine learning algorithm rests on three mathematical pillars: linear algebra provides the geometry of data and transformations; probability theory formalises uncertainty and inference; optimisation theory explains how models improve. This part builds each pillar from first principles, with full derivations and Python simulations.
1
Chapter 1: Linear Algebra for ML
Vectors, matrices, eigendecomposition, SVD โ the geometric and algebraic backbone of every machine learning algorithm.
Matrix multiplication & transposeEigendecomposition A = QฮQโปยนSVD: A = UฮฃVแต (full derivation)Rank, null space, positive definiteness
2
Chapter 2: Probability & Statistics for ML
Probability axioms, Bayesโ theorem, distributions, MLE and MAP estimation โ the language of uncertainty in learning.
Bayes theorem from joint probabilityGaussian, Bernoulli, Categorical, PoissonMLE: derive normal equationsMAP estimation & conjugate priors
3
Chapter 3: Optimization Theory
Gradient descent, convexity, Adam optimizer, Lagrange multipliers and KKT conditions โ how ML models actually learn.
Gradient, Hessian, Jacobian, Taylor expansionConvexity: definition & second-order conditionGD, Momentum, Adam derivationsLagrange multipliers & full KKT conditions
What you will learn
โRepresent data as vectors and matrices and reason geometrically
โDecompose matrices with eigendecomposition and SVD for compression and analysis
โModel uncertainty with probability distributions and derive MLE/MAP estimators
โApply Bayesโ theorem to update beliefs as data arrives
โProve gradient descent converges on convex objectives
โDerive the Adam optimiser from first principles
โFormulate constrained optimisation with Lagrange multipliers and KKT conditions
โUnderstand every Part IIโVII algorithm through these three lenses