Machine Learning

Part III: Neural Networks

Neural networks are universal function approximators built from composing simple parametric transformations. This part derives everything from scratch: the perceptron learning rule, the full backpropagation algorithm, the engineering innovations that make deep networks trainable, and the convolutional inductive bias that powered the deep learning revolution in vision.

What you will learn

Derive backpropagation from the chain rule of calculus
Implement a neural network for XOR from scratch in NumPy
Explain why sigmoid activations cause vanishing gradients
Derive the BatchNorm normalisation, scale, and shift updates
Prove that residual connections preserve gradient magnitude
Derive Xavier and He weight initialisation from variance arguments
Implement 2D convolution from scratch and apply edge-detection kernels
Understand the design evolution from LeNet to EfficientNet

Prerequisites

Part I (linear algebra, calculus, probability) and Part II (supervised learning, gradient descent). You should be comfortable with matrix calculus and the concept of a loss function before beginning.