Chapter 20: Microcontrollers

A microcontroller unit (MCU) integrates a CPU, program memory, data memory, and a rich set of peripherals on a single chip. From the 8-bit Arduino UNO to the 32-bit ARM Cortex-M7, MCUs are the brain of every embedded system: motor controllers, IoT sensors, industrial PLCs, and medical devices. This chapter covers architecture, peripherals, interrupt-driven programming, and real-time operating systems.

Microcontroller Block Diagram

MCU Die BoundaryCPU CoreALU | Registers | PC | IRSystem Bus (AHB / APB)FlashProgram MemorySRAMData MemoryGPIO32 pinsUARTSerialSPI/I2CSync BusADC12-bitPWMTimersExternalWorldSensorsActuatorsNVICInterrupts

20.1 Harvard vs Von Neumann Architecture

Harvard Architecture

Separate address spaces and buses for program (instruction) memory and data memory. The CPU can simultaneously fetch the next instruction while reading/writing data โ€” a hardware form of instruction-level parallelism. Used in most MCUs (AVR, PIC, ARM Cortex-M).

  • + Simultaneous instruction fetch and data access
  • + Fixed instruction width simplifies pipelining
  • โˆ’ Two separate memory bus interfaces
  • โˆ’ Cannot execute code from RAM (typically)

Von Neumann Architecture

A single unified memory space shared by instructions and data over one bus. Simple to implement; the bottleneck is the shared bus (the "Von Neumann bottleneck"). Used in x86/x64 PCs, ARM Cortex-A application processors.

  • + Flexible: code and data share address space
  • + Self-modifying code / JIT compilation possible
  • โˆ’ Bus contention between instruction fetch and data
  • โˆ’ Cache hierarchy needed to hide latency

20.2 Memory: Flash, SRAM, EEPROM

Flash

Non-volatile, stores program code. Erased in sectors (512 B โ€“ 128 KB). Write endurance ~10 000โ€“100 000 cycles. Arduino Uno: 32 KB; STM32F4: up to 2 MB. Wear levelling needed for frequent writes.

SRAM

Volatile, fast random access, stores stack, heap, and global variables. Arduino Uno: 2 KB; STM32F4: 192 KB. Stack overflow is a common MCU bug โ€” always monitor free stack space.

EEPROM

Non-volatile, byte-level erasable, stores configuration data (calibration, device ID). Typically 256 B โ€“ 4 KB on MCU; slow write (~3 ms/byte). Many modern MCUs emulate EEPROM in Flash using wear-levelling libraries.

20.3 Peripherals

GPIO โ€” General Purpose I/O

Digital input or output pins, individually configured. Open-drain mode allows wired-OR connections. Pull-up/pull-down resistors programmable in hardware. Typical sink/source current: 8โ€“25 mA per pin.

UART โ€” Universal Async Receiver/Transmitter

Asynchronous serial: start bit, 8 data bits, optional parity, 1โ€“2 stop bits. Baud rates: 9600, 115200, 1 Mbps. No clock line โ€” baud rate must match on both ends. Used for debug consoles, GPS modules, GSM modems.

SPI โ€” Serial Peripheral Interface

Synchronous: SCLK, MOSI, MISO, CS#. Full-duplex. Speeds up to 50 Mbps. Multiple slaves via separate CS lines. Used for SD cards, displays, DACs, ADCs, flash memory.

I2C โ€” Inter-Integrated Circuit

Two-wire (SDA, SCL) multi-master/multi-slave. 7-bit addressing (up to 127 devices). 100 kHz standard, 400 kHz fast, 1 MHz fast-plus. Requires pull-up resistors. Used for sensors (IMU, pressure, humidity), RTCs, EEPROMs.

ADC โ€” Analog-to-Digital Converter

Samples analog voltage, produces digital code. 10-bit (Arduino) to 16-bit (STM32). Successive-approximation register (SAR) architecture. Nyquist: sample rate โ‰ฅ 2ร— signal bandwidth. Anti-aliasing filter required before ADC input.

PWM โ€” Pulse Width Modulation

Timer-generated digital square wave with variable duty cycle D = t_on / T. Used to control motor speed, LED brightness, servo position. Effective analog voltage: V_avg = D ร— V_supply. Resolution: 8โ€“16 bits.

20.4 Interrupt Handling

An interrupt suspends the main program, saves CPU state (registers + PC on stack), executes an Interrupt Service Routine (ISR), then restores state and resumes. The Nested Vectored Interrupt Controller (NVIC) in ARM Cortex-M supports up to 240 external interrupts with 8โ€“256 priority levels and hardware tail-chaining for back-to-back ISRs.

ISR design rules:

  • Keep ISRs short โ€” do minimal work (set a flag, push to a queue) and process in the main loop.
  • Declare variables shared between ISR and main code as volatile to prevent compiler optimisation from caching them in a register.
  • Disable interrupts around multi-byte variable reads in main code (cli() / sei() on AVR; __disable_irq() on ARM).
  • Never call blocking functions (delay, print, malloc) inside an ISR.
  • Latency: ARM Cortex-M0 = 16 cycles; Cortex-M4 = 12 cycles from interrupt assertion to first ISR instruction.

20.5 Arduino Ecosystem & ARM Cortex-M

Arduino

The Arduino platform abstracts AVR and ARM hardware behind a C++ API: pinMode(), digitalWrite(), analogRead(), Serial.print(). The Uno uses an ATmega328P (8-bit AVR, 16 MHz, 32 KB Flash, 2 KB SRAM). The Due/Zero/MKR series use 32-bit ARM Cortex-M cores at 48โ€“84 MHz.

The setup() function runs once; loop() runs repeatedly. Wire (I2C), SPI, and Servo libraries provide peripheral abstraction. Over 10 000 third-party libraries available via the Library Manager.

ARM Cortex-M & STM32

The ARM Cortex-M family spans M0 (tiny, low-power) through M7 (DSP, FPU, 400+ MHz). STM32 (STMicroelectronics) offers over 1 000 MCU variants. The STM32F4 Discovery (Cortex-M4, 168 MHz, 1 MB Flash, 192 KB SRAM) includes a hardware FPU, enabling single-cycle multiply-accumulate for DSP and motor control algorithms.

HAL (Hardware Abstraction Layer) and CMSIS (Cortex Microcontroller Software Interface Standard) provide portable driver APIs. STM32CubeMX generates initialisation code and FreeRTOS project scaffolding.

20.6 Real-Time Operating Systems โ€” FreeRTOS

A Real-Time Operating System (RTOS) provides deterministic task scheduling, inter-task communication, and resource management. FreeRTOS is the most widely deployed RTOS, running on hundreds of MCU families and supported directly by AWS (Amazon FreeRTOS).

Tasks & Scheduler

Each task is a C function running in an infinite loop with its own stack. The scheduler (preemptive, priority-based) uses the SysTick timer (1 ms tick). Higher-priority tasks preempt lower ones. Cooperative scheduling is also available.

Queues

Thread-safe FIFO buffers for inter-task communication. Producer tasks block when full; consumer tasks block when empty. Used to pass sensor readings from an ISR to a processing task without shared-variable races.

Semaphores & Mutexes

Binary semaphore: signal/wait (ISR to task synchronisation). Counting semaphore: resource pool. Mutex: mutual exclusion with priority inheritance to prevent priority inversion โ€” a critical RTOS concept.

Stack & Heap

Each task has its own stack (typically 256โ€“2048 words). FreeRTOS heap (heap_4.c) provides malloc/free with fragmentation management. uxTaskGetStackHighWaterMark() monitors minimum free stack space.

20.7 PID Control โ€” Mathematical Foundation

The PID controller is the most widely used control algorithm in embedded systems. The continuous-time output \(u(t)\) given error \(e(t) = r(t) - y(t)\) is:

\( u(t) = K_p\,e(t) + K_i \int_0^t e(\tau)\,d\tau + K_d\,\frac{de}{dt} \)

In discrete time with sample period \(T_s\) (Euler approximation):

\( u[k] = K_p\,e[k] + K_i\,T_s\sum_{j=0}^{k} e[j] + \frac{K_d}{T_s}(e[k] - e[k-1]) \)
Proportional Kp: Reduces rise time; leaves steady-state error. Too high โ†’ oscillation.
Integral Ki: Eliminates steady-state error by integrating past errors. Risk: integrator windup.
Derivative Kd: Damps oscillations by anticipating future error. Amplifies measurement noise.

Python: PID Temperature Control Simulation

The simulation models a first-order thermal plant (ฯ„ = 60 s) controlled by a discrete-time PID loop. The left panel shows the temperature response for five controller configurations โ€” P-only, PI, and tuned PID. The right panel plots steady-state error vs Kp, demonstrating that integral action (Ki > 0) drives the error to zero regardless of gain.

Python
script.py102 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Key Equations

PWM average voltage:\( V_{\text{avg}} = D \cdot V_{\text{supply}} \) โ€” duty cycle D = ton/T
ADC resolution:\( \Delta V = V_{\text{ref}} / 2^N \) โ€” N-bit converter, 1 LSB voltage
PID discrete (Euler):\( u[k] = K_p e[k] + K_i T_s \sum e[j] + (K_d/T_s)(e[k]-e[k-1]) \)
UART bit period:\( T_{\text{bit}} = 1/f_{\text{baud}} \) โ€” e.g. 115200 baud โ†’ 8.68 ยตs/bit