Interactomics & Structural Proteomics
Mapping protein-protein interaction networks, proximity landscapes, and three-dimensional structures at proteome scale
12.1 Experimental Detection of Protein-Protein Interactions
Proteins rarely function in isolation. The cellular machinery operates through an elaborate network of protein-protein interactions (PPIs) that govern signal transduction, gene regulation, metabolic channeling, and structural organization. The human interactome is estimated to comprise 300,000-650,000 binary interactions, of which only a fraction has been experimentally characterized. Multiple complementary techniques are required to survey this vast interaction space, each with distinct strengths in terms of throughput, sensitivity to transient interactions, and the ability to capture interactions in their native cellular context.
Yeast Two-Hybrid (Y2H) System
The yeast two-hybrid assay, developed by Fields and Song (1989), is a genetic method for detecting binary protein-protein interactions in vivo. It exploits the modular nature of transcription factors: the "bait" protein is fused to the DNA-binding domain (DBD) of GAL4 (or LexA), and the "prey" protein is fused to the activation domain (AD). If bait and prey interact, they reconstitute a functional transcription factor that drives expression of reporter genes (HIS3, ADE2, lacZ). High-throughput Y2H screens have mapped tens of thousands of binary interactions in yeast, human, Drosophila, and C. elegans interactomes.
Y2H Strengths & Limitations
Strengths: Detects binary (direct) interactions; scalable to proteome-wide screens using ORF clone libraries; no need for purified proteins or antibodies.
Limitations: Interactions occur in the yeast nucleus (non-native environment); cannot detect interactions requiring PTMs or multiprotein complexes; high false-positive rate (~50%) and false-negative rate (~70%); membrane proteins poorly represented. Complementary approaches (AP-MS, proximity labeling) are essential for comprehensive interactome coverage.
Affinity Purification Mass Spectrometry (AP-MS)
AP-MS is the workhorse for characterizing protein complexes under near-native conditions. A tagged bait protein (FLAG, HA, GFP, or Strep-tag) is expressed in cells, and its associated protein complex is captured by immunoaffinity purification. After stringent washing to remove non-specific binders, co-purified proteins are identified by LC-MS/MS. The BioPlex project has used AP-MS to map >100,000 interactions involving >14,000 human proteins, providing the most comprehensive AP-MS-based human interactome to date.
Co-Immunoprecipitation (Co-IP)
Co-IP uses an antibody against an endogenous protein (no tagging required) to pull down the target and its binding partners from a cell lysate. The immunoprecipitate is analyzed by Western blotting (for hypothesis-driven validation of specific interactions) or by LC-MS/MS (for unbiased identification). Co-IP captures interactions at endogenous expression levels, avoiding overexpression artifacts, but is limited by antibody quality and availability. Controls include IgG isotype controls and antibody-negative lysate controls to distinguish specific from non-specific co-purification.
SAINT Probability Score for AP-MS
The Significance Analysis of INTeractome (SAINT) algorithm uses Bayesian modeling to distinguish true interactors from background contaminants. Spectral counts for each prey protein in bait purifications are modeled against the distribution observed in negative controls. The posterior probability (SAINT score, 0-1) reflects the likelihood that the interaction is genuine. A threshold of SAINT $\geq$ 0.8 (or BFDR $\leq$ 0.05) is standard for high-confidence interactors. CRAPome, a contaminant repository, provides background distributions for common non-specific binders (e.g., ribosomal proteins, chaperones, keratins).
12.2 Proximity Labeling: BioID & APEX
Proximity labeling overcomes a fundamental limitation of AP-MS: the requirement to maintain interactions through cell lysis and affinity purification, which biases against weak, transient, or context-dependent interactions. Instead, proximity labeling covalently tags neighboring proteins in living cells, creating a permanent record of the spatial neighborhood of a bait protein that survives stringent denaturing purification.
BioID / TurboID
BioID fuses the bait protein to a promiscuous biotin ligase (BirA* carrying an R118G mutation). In the presence of biotin, BirA* generates reactive biotinoyl-AMP, which diffuses and covalently biotinylates primary amines (Lys residues) on proximal proteins within a ~10 nm radius. Biotinylated proteins are purified on streptavidin beads under denaturing conditions (2% SDS) and identified by LC-MS/MS.
TurboID is an engineered variant with ~10-fold faster labeling kinetics (10 min vs. 18-24 h for BioID), enabling temporal resolution of dynamic proximity landscapes. miniTurbo offers even faster labeling with reduced background biotinylation.
APEX / APEX2
APEX2 is an engineered ascorbate peroxidase that, in the presence of H$_2$O$_2$ and biotin-phenol, generates short-lived biotin-phenoxyl radicals (<1 ms lifetime) that covalently label electron-rich amino acids (Tyr, Trp, His, Cys) within ~20 nm. The extremely short radical lifetime provides a snapshot of the proximal proteome at a defined moment (1-minute labeling pulse).
APEX is particularly powerful for organelle proteomics (mapping the mitochondrial matrix, ER membrane, synaptic cleft) and has been combined with quantitative proteomics (SILAC-APEX, TMT-APEX) to identify condition-dependent changes in the proximitome.
Proximity vs. Physical Interaction
It is critical to understand that proximity labeling identifies proteins that are near the bait, not necessarily in direct physical contact. Labeled proteins may include direct interactors, members of the same complex, proteins in the same subcellular compartment, or even bystanders that transiently diffuse through the labeling radius. Careful quantitative analysis, comparison with controls (untagged cells, cytoplasmic controls), and integration with orthogonal interaction data (AP-MS, Y2H) are essential for biological interpretation.
12.3 Structural Mass Spectrometry
Beyond identifying interactors, mass spectrometry can provide structural information about protein complexes, conformational dynamics, and binding interfaces. Structural MS methods bridge the gap between high-resolution but low-throughput structural biology (X-ray, NMR, cryo-EM) and high-throughput but structurally uninformative interaction mapping.
Cross-Linking Mass Spectrometry (XL-MS)
Chemical cross-linkers covalently connect amino acid residues that are in spatial proximity (<30 Angstrom for common amine-reactive cross-linkers like BS3 and DSS). After digestion, cross-linked peptide pairs are identified by LC-MS/MS using specialized search engines (pLink, XlinkX, MeroX). Each cross-link represents a maximum distance constraint between two residues, and integrating many such constraints enables computational modeling of protein structures and complex architectures. Cross-linking maps can be overlaid onto existing crystal structures or cryo-EM densities to validate and refine models, or used as restraints for integrative modeling (e.g., with IMP/HADDOCK).
Hydrogen-Deuterium Exchange (HDX-MS)
HDX-MS monitors the exchange of backbone amide hydrogens with deuterium from D$_2$O solvent. Amide hydrogens that are solvent-exposed and not involved in hydrogen bonds (e.g., in flexible loops) exchange rapidly, while those buried in the protein core or engaged in stable secondary structure exchange slowly. The rate of deuterium incorporation is measured by the mass increase of peptic peptides (digested under quench conditions: pH 2.5, 0 degrees C) over a time course.
HDX Kinetics: Exchange Rate Equation
where $D(t)$ is the number of deuterons incorporated at time $t$, $N$ is the number of exchangeable amide hydrogens in the peptide, and $k_i$ is the intrinsic exchange rate of residue $i$. In practice, a simplified bimodal or stretched-exponential model is often used. The intrinsic exchange rate depends on pH and temperature:
HDX-MS is particularly powerful for mapping ligand binding sites, allosteric conformational changes, and epitope mapping for therapeutic antibody development.
Native Mass Spectrometry
Native MS preserves non-covalent interactions by spraying protein complexes from volatile, near-physiological buffers (ammonium acetate, pH 6.5-7.5) using nano-ESI. The intact complex is transmitted to the mass analyzer, revealing stoichiometry, subunit composition, and binding of small-molecule ligands. Surface-induced dissociation (SID) can be used to dissect the complex topology. Native MS has been used to characterize ribosomes, proteasomes, virus capsids, and membrane protein complexes, with masses exceeding 1 MDa being measured on modified Q-TOF and Orbitrap platforms.
12.4 Computational Structure Prediction & Cryo-EM
The past decade has witnessed a revolution in structural biology driven by two complementary advances: cryo-electron microscopy (cryo-EM), which can determine structures of large complexes without crystallization, and deep learning-based structure prediction, exemplified by AlphaFold, which can predict protein structures with near-experimental accuracy from sequence alone.
Cryo-EM for Structural Proteomics
Single-particle cryo-EM flash-freezes protein complexes in vitreous ice and images thousands of individual particles using a transmission electron microscope equipped with a direct electron detector. Computational averaging and 3D reconstruction yield density maps at 2-4 Angstrom resolution for well-behaved complexes, sufficient for atomic model building. The "resolution revolution" in cryo-EM (enabled by direct electron detectors and improved algorithms since ~2013) has made it the method of choice for large, flexible, and heterogeneous complexes that resist crystallization.
AlphaFold & Protein Structure Prediction
AlphaFold2 (Jumper et al., 2021) uses a deep learning architecture incorporating evolutionary co-variation from multiple sequence alignments (MSAs) and structural templates to predict protein 3D structures with remarkable accuracy (median GDT-TS >90 for CASP14 targets). The AlphaFold Protein Structure Database provides predicted structures for >200 million proteins across all kingdoms of life. AlphaFold-Multimer extends the approach to predict the structures of protein complexes, enabling proteome-scale structural modeling of interactomes.
Integrative Structural Proteomics
The most powerful approach to understanding protein complex architecture integrates multiple data sources: XL-MS distance restraints, HDX-MS dynamics, AP-MS/proximity labeling interactome data, cryo-EM density maps, and AlphaFold predictions. Integrative modeling platforms (e.g., IMP from the Sali lab) combine these heterogeneous data types into unified structural models with defined uncertainty. This approach has been used to determine the architecture of the nuclear pore complex, the 26S proteasome, and chromatin remodeling complexes at resolutions impossible from any single technique alone.
12.5 PPI Network Analysis & Databases
Protein-protein interaction data, once assembled, forms a network (graph) in which proteins are nodes and interactions are edges. Network analysis provides a systems-level view of cellular organization, revealing functional modules, essential hub proteins, and the topological properties that govern network robustness and information flow. The mathematical formalism of graph theory provides the tools for this analysis.
Network Topology Metrics
Degree Distribution
Biological networks are scale-free: the probability $P(k)$ that a node has degree $k$ (number of interaction partners) follows a power law with exponent $\gamma$ typically between 2 and 3. This means most proteins have few interactions while a small number of "hub" proteins are highly connected. Hub proteins tend to be essential genes and are enriched among drug targets and disease-associated proteins.
Betweenness Centrality
where $\sigma_{st}$ is the total number of shortest paths from node $s$ to node $t$, and $\sigma_{st}(v)$ is the number of those paths that pass through node $v$. Proteins with high betweenness centrality act as "bottleneck" nodes that bridge different functional modules. Removal of such proteins disproportionately disrupts network connectivity, and they are often essential for cell viability.
Clustering Coefficient
where $e_i$ is the number of edges between the $k_i$ neighbors of node $i$. The clustering coefficient measures the tendency of a protein's interaction partners to also interact with each other, forming a densely connected neighborhood. High clustering coefficients indicate membership in a protein complex or functional module. The global clustering coefficient is the average $C_i$ over all nodes.
Functional Module Detection
Functional modules are densely interconnected subnetworks that typically correspond to protein complexes or biological pathways. Community detection algorithms identify these modules computationally:
| Algorithm | Approach | Strengths |
|---|---|---|
| MCODE | Vertex weighting by local neighborhood density | Detects dense subgraphs; identifies complexes |
| ClusterONE | Greedy growth with overlap | Allows overlapping modules; uses edge weights |
| Louvain | Modularity optimization | Scalable to very large networks; fast |
| Markov Clustering (MCL) | Simulated random walks and flow expansion | Robust; widely used for PPI clustering |
| Walktrap | Short random walks define node proximity | Good for hierarchical community structure |
PPI Databases
STRING
Integrates known and predicted PPIs from experimental data, text mining, co-expression, genomic context, and homology transfer. Provides a combined confidence score (0-1) for each interaction. Covers >14,000 organisms with >67 million proteins.
BioGRID
A curated repository of genetic and physical interactions from published literature. Includes >2.5 million interactions across major model organisms. All entries are manually curated with experimental evidence codes and PubMed references.
IntAct / MINT
IMEx consortium members providing deeply curated molecular interaction data following the MIMIx (Minimum Information about a Molecular Interaction eXperiment) guidelines. Each interaction is annotated with the detection method, participant identification method, and interaction type using controlled vocabularies (PSI-MI ontology).
Chapter Summary: Key Concepts
- โY2H detects binary interactions genetically; AP-MS/Co-IP captures complexes biochemically; each has complementary biases in coverage and false-positive rates.
- โBioID/TurboID and APEX2 proximity labeling capture the spatial proteome neighborhood of a bait protein in living cells, revealing transient and context-dependent associations.
- โXL-MS provides distance constraints; HDX-MS reveals conformational dynamics; native MS determines complex stoichiometry โ all feeding into integrative structural models.
- โAlphaFold2 and cryo-EM have revolutionized structural proteomics, enabling proteome-scale structure prediction and complex architecture determination.
- โPPI networks exhibit scale-free topology; hub and bottleneck proteins are functionally critical; module detection reveals complexes and pathway organization.