Protein Folding & Misfolding

From Anfinsen's thermodynamic hypothesis through Levinthal's paradox to modern funnel theory — with full derivations of folding kinetics, the Zimm-Bragg helix-coil transition, and computational simulations.

Learning Objectives

  • Understand Anfinsen's thermodynamic hypothesis and the concept of the native state
  • Derive the numbers behind Levinthal's paradox and explain why it demands a directed search
  • Analyze the free energy landscape and funnel theory of protein folding
  • Derive two-state folding kinetics, chevron plots, and $\phi$-value analysis
  • Apply the Zimm-Bragg model to helix-coil transitions
  • Connect folding theory to amyloid diseases and modern structure prediction

1. Introduction — Anfinsen's Thermodynamic Hypothesis

In 1961, Christian Anfinsen demonstrated that the enzyme ribonuclease A, once fully denatured and reduced, could refold spontaneously into its catalytically active form when the denaturant was removed and disulfide bonds were allowed to re-form. This landmark experiment established the thermodynamic hypothesis:

The native structure of a protein is the thermodynamically most stable state — the unique conformation that minimizes the Gibbs free energy under physiological conditions.

Mathematically, the native state N satisfies:

$G(N) = \min_{\{\text{all conformations } C\}} G(C)$

The Gibbs free energy of the native state relative to the unfolded ensemble is:

$\Delta G_{\text{fold}} = G_N - G_U = \Delta H_{\text{fold}} - T\Delta S_{\text{fold}}$

For a typical small globular protein, $\Delta G_{\text{fold}} \approx -5$ to$-15 \; \text{kcal/mol}$ — remarkably small compared to the total enthalpic and entropic contributions, which can each be hundreds of kcal/mol. The native state is thus only marginally stable, a delicate balance between large opposing forces.

Anfinsen's insight implies that the amino acid sequence alone encodes all the information needed to determine the three-dimensional structure. This is the foundation of the entire field of protein structure prediction — from homology modeling to the revolutionary AlphaFold.

The key insight of Ken Dill, Jose Onuchic, Peter Wolynes, and others is that the energy landscape for a foldable protein is funnel-shaped. As the chain forms increasing numbers of native contacts, the energy decreases on average, creating a downhill bias toward the native state.

Quantifying the Funnel

Define a reaction coordinate $Q$ as the fraction of native contacts formed ($0 \leq Q \leq 1$). The free energy as a function of $Q$ is:

$G(Q) = E(Q) - T S(Q)$

The energy decreases roughly linearly with $Q$:

$E(Q) \approx -Q \cdot n \cdot \epsilon_0$

where $\epsilon_0$ is the average energy per native contact. The conformational entropy decreases as native contacts constrain the chain:

$S(Q) \approx (1 - Q) \cdot n \cdot k_B \ln(\Omega_0)$

The competition between these two terms creates a free energy profile with a barrier at intermediate $Q$, the height of which determines the folding rate. When $\epsilon_0 > k_B T \ln(\Omega_0)$, the funnel slope is sufficient to guide folding.

The principle of minimal frustration (Bryngelson and Wolynes, 1987) states that natural proteins have evolved sequences where the energy landscape is minimally frustrated — meaning that most local energy minima still point downhill toward the native state, avoiding deep kinetic traps.

4. Derivation 3: Two-State Folding Kinetics

Many small single-domain proteins fold in a cooperative, two-state manner with no detectable intermediates. The folding reaction is:

$N \rightleftharpoons U$

Equilibrium Thermodynamics

The equilibrium constant for unfolding is:

$K_U = \frac{[U]}{[N]} = \exp\left(\frac{-\Delta G_{N \to U}}{RT}\right) = \exp\left(\frac{\Delta G_{\text{fold}}}{RT}\right)$

Since $\Delta G_{\text{fold}} = G_N - G_U < 0$, we have $K_U < 1$, and the native state is favored. The fraction of unfolded protein is:

$f_U = \frac{K_U}{1 + K_U} = \frac{1}{1 + \exp(-\Delta G_{\text{fold}}/RT)}$

Linear Free Energy Relationships and Denaturant Dependence

The stability of a protein varies linearly with denaturant concentration [D] (urea or guanidinium chloride):

$\Delta G_U([\text{D}]) = \Delta G_U^{H_2O} - m_{\text{eq}} \cdot [\text{D}]$

where $\Delta G_U^{H_2O}$ is the stability in the absence of denaturant and$m_{\text{eq}}$ (the m-value) reflects the change in solvent-accessible surface area upon unfolding, typically$1$$5$ kcal/(mol$\cdot$M).

Folding and Unfolding Rate Constants

Using transition state theory, the microscopic rate constants for folding ($k_f$) and unfolding ($k_u$) also depend linearly on denaturant:

$\ln k_f([\text{D}]) = \ln k_f^{H_2O} - \frac{m_f}{RT} \cdot [\text{D}]$

$\ln k_u([\text{D}]) = \ln k_u^{H_2O} + \frac{m_u}{RT} \cdot [\text{D}]$

where $m_f$ and $m_u$ are the kinetic m-values, with$m_{\text{eq}} = m_f + m_u$.

Derivation of the Chevron Plot

In a kinetic experiment (e.g., stopped-flow fluorescence), the observed relaxation rate is the sum of folding and unfolding rates:

$k_{\text{obs}} = k_f + k_u$

Substituting the denaturant dependencies:

$k_{\text{obs}}([\text{D}]) = k_f^{H_2O} \exp\left(-\frac{m_f}{RT}[\text{D}]\right) + k_u^{H_2O} \exp\left(\frac{m_u}{RT}[\text{D}]\right)$

Taking the logarithm of $k_{\text{obs}}$:

$\ln k_{\text{obs}}([\text{D}]) = \ln\left[k_f^{H_2O} e^{-m_f[\text{D}]/RT} + k_u^{H_2O} e^{+m_u[\text{D}]/RT}\right]$

Shape of the Chevron Plot

Plotting $\ln k_{\text{obs}}$ vs [D] yields the characteristic V-shaped chevron plot:

  • Left arm (low [D]): $k_f \gg k_u$, so $\ln k_{\text{obs}} \approx \ln k_f^{H_2O} - (m_f/RT)[\text{D}]$ — decreasing slope
  • Minimum (midpoint): At $[\text{D}]_{1/2}$ where $k_f = k_u$ and $\Delta G = 0$
  • Right arm (high [D]): $k_u \gg k_f$, so $\ln k_{\text{obs}} \approx \ln k_u^{H_2O} + (m_u/RT)[\text{D}]$ — increasing slope

$\phi$-Value Analysis

Alan Fersht developed $\phi$-value analysis to map the structure of the transition state ensemble. For a mutation that changes the stability by $\Delta\Delta G_{N-U}$and the folding activation energy by $\Delta\Delta G_{\ddagger-U}$:

$\phi = \frac{\Delta\Delta G_{\ddagger - U}}{\Delta\Delta G_{N - U}} = \frac{RT \ln(k_f^{\text{wt}}/k_f^{\text{mut}})}{\Delta\Delta G_{N-U}}$

Interpretation of $\phi$-values:

  • $\phi = 1$: The mutated residue is fully structured in the transition state (native-like interactions fully formed)
  • $\phi = 0$: The mutated residue is fully unstructured in the transition state (no native contacts formed)
  • $0 < \phi < 1$: Partial structure formation, indicating the residue is involved in partially formed interactions at the transition state

5. Derivation 4: Helix-Coil Transition (Zimm-Bragg Model)

The helix-coil transition is one of the simplest and most thoroughly understood conformational transitions in biophysics. The Zimm-Bragg model (1959) treats a polypeptide chain as a one-dimensional Ising-like system where each residue is either in a helical (h) or coil (c) state.

Model Parameters

  • Propagation parameter $s$: The equilibrium constant for adding a helical residue to an existing helix.$s = \exp(-\Delta G_{\text{prop}}/k_BT)$. When $s > 1$, helix extension is favorable. The transition occurs near $s = 1$.
  • Nucleation parameter $\sigma$: The statistical weight penalty for initiating a new helical segment.$\sigma \ll 1$ (typically $10^{-3}$ to $10^{-4}$for $\alpha$-helices) because forming the first turn of a helix requires constraining $\sim 3$ residues without gaining a hydrogen bond.

Transfer Matrix Method

The statistical weight of each residue depends on its own state and that of its predecessor. We define a $2 \times 2$ transfer matrix $\mathbf{M}$:

$\mathbf{M} = \begin{pmatrix} 1 & \sigma s \\ 1 & s \end{pmatrix}$

Rows: predecessor state (c, h). Columns: current state (c, h).

The element $M_{ij}$ gives the statistical weight for residue $k$being in state $j$ given that residue $k-1$ is in state $i$:

  • $c \to c$: weight = 1 (reference state)
  • $c \to h$: weight = $\sigma s$ (nucleation penalty $\times$ propagation)
  • $h \to c$: weight = 1 (helix termination, no penalty)
  • $h \to h$: weight = $s$ (helix propagation)

Partition Function

For a chain of $N$ residues, the partition function is obtained by multiplying transfer matrices:

$Z = \mathbf{v}_0^T \cdot \mathbf{M}^N \cdot \mathbf{v}_f$

where $\mathbf{v}_0 = (1, 0)^T$ (chain starts in coil) and $\mathbf{v}_f = (1, 1)^T$ (sum over final states).

Eigenvalue Solution

The eigenvalues of $\mathbf{M}$ are found from $\det(\mathbf{M} - \lambda \mathbf{I}) = 0$:

$(1 - \lambda)(s - \lambda) - \sigma s = 0$

$\lambda^2 - (1 + s)\lambda + s(1 - \sigma) = 0$

Using the quadratic formula:

$\lambda_{\pm} = \frac{(1 + s) \pm \sqrt{(1 - s)^2 + 4\sigma s}}{2}$

For large $N$, the partition function is dominated by the larger eigenvalue $\lambda_+$:

$Z \approx c_+ \lambda_+^N$

Fraction Helix

The average fraction of residues in the helical state is:

$\theta = \frac{s}{N} \frac{\partial \ln Z}{\partial s} \approx \frac{s}{\lambda_+}\frac{\partial \lambda_+}{\partial s}$

Evaluating the derivative:

$\frac{\partial \lambda_+}{\partial s} = \frac{1}{2}\left(1 + \frac{-(1-s) + 2\sigma}{\sqrt{(1-s)^2 + 4\sigma s}}\right)$

At the transition midpoint $s = 1$:

$\theta(s=1) = \frac{1}{2}$

Sharpness of the Transition

The sharpness of the helix-coil transition is controlled by $\sigma$. Smaller $\sigma$ (stronger nucleation penalty) gives a sharper, more cooperative transition. In the limit $\sigma \to 0$, the transition becomes an all-or-nothing phase transition. For real $\alpha$-helices with$\sigma \approx 10^{-3}$$10^{-4}$, the transition is fairly sharp and occurs over a narrow temperature range of $\sim 10$$20$ K.

6. Applications

AlphaFold & Structure Prediction

DeepMind's AlphaFold (2020) achieved near-experimental accuracy in protein structure prediction at CASP14, validating Anfinsen's hypothesis computationally. It uses multiple sequence alignments and attention-based neural networks to predict 3D coordinates directly from sequence. AlphaFold2 has predicted structures for over 200 million proteins, transforming structural biology.

Amyloid Diseases

Protein misfolding leads to aggregation into amyloid fibrils — ordered, cross-$\beta$ sheet structures that are thermodynamically stable but kinetically trapped. These are implicated in:

  • Alzheimer's disease: A$\beta$ peptide and tau protein
  • Parkinson's disease: $\alpha$-synuclein
  • Prion diseases: PrP$^{\text{Sc}}$
  • Type II diabetes: IAPP (amylin)

Molecular Chaperones

Chaperones (GroEL/GroES, Hsp70, Hsp90) do not provide folding information but rather prevent aggregation by sequestering unfolded or partially folded intermediates. GroEL provides an isolated cavity where a single protein can fold without intermolecular contacts. The iterative annealing mechanism (Thirumalai and Lorimer) suggests chaperones unfold kinetically trapped intermediates, giving them another chance to reach the native state.

Drug Design Targeting Misfolding

Therapeutic strategies include:

  • Kinetic stabilizers: Tafamidis binds transthyretin (TTR) and prevents amyloid formation
  • Pharmacological chaperones: Small molecules that stabilize the native fold
  • Aggregation inhibitors: Compounds that block fibril elongation
  • Immunotherapy: Antibodies targeting amyloid plaques (e.g., lecanemab for Alzheimer's)

7. Historical Context

1961

Christian Anfinsen demonstrates spontaneous refolding of ribonuclease A, establishing the thermodynamic hypothesis. He receives the Nobel Prize in Chemistry in 1972 "for his work on ribonuclease, especially concerning the connection between the amino acid sequence and the biologically active conformation."

1968

Cyrus Levinthal articulates the paradox bearing his name at a conference, published in 1969. The paradox motivates the search for folding pathways and intermediates.

1987

Bryngelson & Wolynes introduce the energy landscape theory and the principle of minimal frustration, laying the groundwork for the funnel picture.

1995

Ken Dill, Jose Onuchic, and Peter Wolynes formalize the folding funnel concept and develop the "new view" of protein folding based on statistical mechanics of heteropolymers.

1998

David Baker and colleagues design the first computationally designed protein (Top7) and develop the Rosetta software suite, which becomes a cornerstone of computational protein design.

2020

DeepMind's AlphaFold2 achieves near-experimental accuracy at CASP14, solving the protein structure prediction problem for single-domain proteins. Demis Hassabis and John Jumper share the 2024 Nobel Prize in Chemistry with David Baker for computational protein design and structure prediction.

Related Video Lectures

Anfinsen's Dogma (Thermodynamic Hypothesis)

Levinthal's Paradox

Protein Folding is a Hydrophobic Collapse

Free Energy of Alpha Helix Formation

8. Python Simulation

Below we simulate three key aspects of protein folding theory using only NumPy: (1) the free energy profile as a function of the reaction coordinate at different temperatures, (2) the chevron plot for two-state folding kinetics, and (3) the Zimm-Bragg helix-coil transition showing how the nucleation parameter $\sigma$ controls cooperativity.

Protein Folding: Free Energy Landscape & Kinetics

Python
script.py233 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Summary of Key Equations

Levinthal's Paradox

$\Omega = 3^{100} \approx 5 \times 10^{47}, \quad t_{\text{search}} = \frac{\Omega}{10^{13}\,\text{s}^{-1}} \approx 10^{27}\,\text{years}$

Folding Free Energy

$\Delta G_{\text{fold}} = \Delta H_{\text{fold}} - T\Delta S_{\text{fold}}$

Two-State Equilibrium

$K_U = \frac{[U]}{[N]} = \exp\!\left(\frac{\Delta G_{\text{fold}}}{RT}\right), \quad \Delta G_U([\text{D}]) = \Delta G_U^{H_2O} - m_{\text{eq}}[\text{D}]$

Chevron Plot

$k_{\text{obs}} = k_f^{H_2O}\,e^{-m_f[\text{D}]/RT} + k_u^{H_2O}\,e^{+m_u[\text{D}]/RT}$

$\phi$-Value Analysis

$\phi = \frac{\Delta\Delta G_{\ddagger - U}}{\Delta\Delta G_{N - U}}$

Zimm-Bragg Eigenvalues

$\lambda_{\pm} = \frac{(1 + s) \pm \sqrt{(1 - s)^2 + 4\sigma s}}{2}$