Mass Spectrometry-Based Proteomics
Ionization, mass analysis, and computational identification of proteins from peptide fragmentation spectra
10.1 Fundamentals of Mass Spectrometry
A mass spectrometer measures the mass-to-charge ratio ($m/z$) of gas-phase ions. Every mass spectrometer comprises three essential components: an ion source that converts analyte molecules into gas-phase ions, a mass analyzer that separates ions according to their $m/z$ values, and a detector that records the abundance of ions at each $m/z$. The resulting mass spectrum is a plot of ion intensity versus $m/z$, providing a molecular fingerprint of the analyte mixture.
Fundamental m/z Relationship
where $M$ is the neutral monoisotopic (or average) mass of the analyte, $z$ is the number of elementary charges, and $m_{H^+} = 1.00728$ Da is the mass of a proton. For multiply charged ions produced by ESI, a single protein generates an "envelope" of peaks at different charge states, and the true mass can be deconvoluted from adjacent peaks:
Key Performance Metrics
| Parameter | Definition | Typical Values |
|---|---|---|
| Resolution (R) | $R = m / \Delta m$ at FWHM | Quad: ~3,000; TOF: ~40,000; Orbitrap: ~500,000 |
| Mass accuracy | Deviation from true mass (ppm) | Quad: ~100 ppm; TOF: ~5 ppm; Orbitrap: <1 ppm |
| Sensitivity | Minimum detectable amount | Attomole to femtomole range for modern instruments |
| Scan speed | Spectra per second | Quad: ~10 Hz; TOF: ~100 Hz; Orbitrap: ~40 Hz |
| Dynamic range | Ratio of most to least abundant detectable ion | $10^4$ to $10^5$ within a single spectrum |
Mass Resolution
where $\Delta m_{50\%}$ is the full width at half maximum (FWHM) of the peak at mass $m$. High resolution is essential for distinguishing nearly isobaric peptide ions and resolving isotope envelopes, enabling accurate charge-state determination and monoisotopic mass assignment.
10.2 Ionization Methods for Biomolecules
The development of soft ionization methods — electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI) — was the breakthrough that enabled mass spectrometric analysis of intact proteins and peptides. Both techniques earned their inventors (John Fenn and Koichi Tanaka) the 2002 Nobel Prize in Chemistry. "Soft" ionization transfers analyte molecules into the gas phase with minimal fragmentation, preserving the molecular ion for accurate mass measurement.
MALDI
Matrix-Assisted Laser Desorption/Ionization. The analyte is co-crystallized with a UV-absorbing matrix (e.g., $\alpha$-cyano-4-hydroxycinnamic acid for peptides, sinapinic acid for proteins, 2,5-dihydroxybenzoic acid for glycopeptides) on a metal target plate. A pulsed UV laser (337 nm N$_2$ or 355 nm Nd:YAG) irradiates the crystal, causing rapid ablation and gas-phase proton transfer. MALDI produces predominantly singly charged ions ($[M+H]^+$), yielding simple spectra.
- • Tolerant of salts and buffers
- • High throughput (automated plate reading)
- • Typically coupled to TOF analyzers
- • Ideal for peptide mass fingerprinting
- • Used in MALDI imaging (spatial proteomics)
ESI
Electrospray Ionization. A solution of analyte is infused through a capillary held at high voltage (2-5 kV). The electric field at the capillary tip generates a Taylor cone from which a fine spray of charged droplets is emitted. Solvent evaporation (assisted by heated desolvation gas) causes droplets to shrink until Coulombic repulsion overcomes surface tension (Rayleigh limit), producing gas-phase ions. ESI generates multiply charged ions ($[M+nH]^{n+}$), bringing large proteins into the $m/z$ range of most analyzers.
- • Continuous flow, directly coupled to LC
- • Multiple charge states enable mass deconvolution
- • Nano-ESI (20-500 nL/min) for high sensitivity
- • Compatible with all mass analyzer types
- • Standard for LC-MS/MS proteomics
Rayleigh Limit for Charged Droplets
As charged ESI droplets evaporate, they undergo Coulombic fission when they reach the Rayleigh stability limit:
where $q$ is the charge, $\varepsilon_0$ is the vacuum permittivity,$\gamma$ is the surface tension, and $r$ is the droplet radius. Below this limit, solvent evaporation concentrates charge further until either the charge residue model (CRM) or the ion evaporation model (IEM) explains the final transfer of ions to the gas phase.
10.3 Mass Analyzers
The mass analyzer is the heart of the mass spectrometer. Different analyzer types exploit distinct physical principles to separate ions by $m/z$, and each offers a different combination of resolution, mass accuracy, scan speed, sensitivity, and mass range. Modern proteomic platforms often combine two or more analyzers in a hybrid configuration to leverage complementary strengths.
Quadrupole (Q)
Four parallel rods with oscillating RF and DC voltages create a dynamic electric field. Only ions with a specific$m/z$ maintain stable trajectories through the quadrupole at given RF/DC settings; all others collide with the rods and are lost. By scanning the RF/DC ratio, the quadrupole acts as a tunable mass filter. Quadrupoles have unit mass resolution (~0.7 Da FWHM) but excel as mass filters in tandem instruments (QqQ, Q-TOF, Q-Orbitrap). The Mathieu stability equation governs ion trajectories:
where $U$ is the DC voltage, $V$ is the RF amplitude, $\omega$ is the angular frequency, and $r_0$ is the inscribed radius.
Time-of-Flight (TOF)
Ions are accelerated through a known potential $V$ and then drift through a field-free flight tube of length $L$. Since all ions receive the same kinetic energy, lighter ions arrive at the detector first. The flight time is directly related to $m/z$:
TOF Equation
where $t$ is flight time, $L$ is the drift length, $m$ is ion mass,$z$ is charge number, $e$ is elementary charge, and $V$ is the accelerating potential. Reflectron (ion mirror) geometry compensates for initial kinetic energy spread, improving resolution to $R > 40{,}000$.
TOF analyzers have theoretically unlimited mass range, making them ideal for MALDI-TOF analysis of intact proteins. In the Q-TOF hybrid, a quadrupole selects precursor ions for CID fragmentation, and the TOF provides high-resolution, high-accuracy fragment ion measurement.
Orbitrap
Invented by Alexander Makarov, the Orbitrap traps ions in an electrostatic field around a central spindle electrode. Ions oscillate axially with a frequency that depends only on their $m/z$:$\omega = \sqrt{k \cdot z / m}$, where $k$ is a field constant. The image current induced by oscillating ions is detected, and a Fourier transform converts the time-domain transient into a frequency (hence mass) spectrum. Resolution increases with acquisition time: $R \propto t_{\text{transient}}$. The Orbitrap achieves $R > 500{,}000$ at $m/z$ 200 with sub-ppm mass accuracy, making it the gold standard for discovery proteomics (Thermo Q Exactive, Exploris, Eclipse series).
Ion Trap & FT-ICR
Linear ion trap (LIT): traps ions radially with RF fields and axially with DC end-cap potentials. Capable of MS$^n$ experiments via sequential isolation/fragmentation. Fast scan speed but lower resolution (~10,000). Used as a workhorse scan in tribrid instruments (Orbitrap Fusion/Eclipse).
Fourier Transform Ion Cyclotron Resonance (FT-ICR): ions orbit in a strong magnetic field (7-21 T) at a cyclotron frequency $\omega_c = zeB/m$. FT detection yields the highest resolution of any mass analyzer ($R > 10^6$) and mass accuracy (<0.1 ppm), but instruments are expensive and have slower duty cycles. Primarily used for top-down proteomics and complex mixture analysis (petroleomics).
10.4 Tandem Mass Spectrometry & Fragmentation
Tandem mass spectrometry (MS/MS or MS$^2$) is the cornerstone of peptide sequencing. A precursor ion is selected in the first stage of mass analysis, subjected to fragmentation, and the resulting product ions are analyzed in the second stage. The pattern of fragment ions encodes the amino acid sequence, enabling identification by database searching or de novo interpretation.
Fragmentation Methods
| Method | Full Name | Bond Cleaved | Ion Series | Best For |
|---|---|---|---|---|
| CID | Collision-Induced Dissociation | Amide (peptide) bond | b and y ions | Standard peptide sequencing |
| HCD | Higher-energy Collisional Dissociation | Amide bond | b and y ions, immonium | Orbitrap-based sequencing, TMT |
| ETD | Electron Transfer Dissociation | N-C$_\alpha$ bond | c and z ions | Labile PTMs, large peptides |
| EThcD | Electron Transfer + HCD | Both bond types | b, y, c, z ions | Complete sequence coverage |
Peptide Backbone Fragmentation Nomenclature
The Roepstorff-Fohlman-Biemann nomenclature defines fragment ions by which backbone bond is cleaved and which fragment retains the charge. Cleavage of the amide bond produces b ions (N-terminal fragment with charge) and y ions (C-terminal fragment with charge). Cleavage of the N-C$_\alpha$ bond (by ETD/ECD) yields c and z ions. For a peptide of $n$ residues, the complete series would comprise $b_1$ through $b_{n-1}$ and $y_1$ through$y_{n-1}$, providing $2(n-1)$ sequence ions. The mass difference between consecutive ions in a series directly indicates the amino acid residue mass.
Peptide Mass Fingerprinting (PMF)
In PMF, a protein is digested with a specific protease (typically trypsin, which cleaves C-terminal to Arg and Lys), and the resulting peptide masses are measured by MALDI-TOF. The experimental mass list is compared against theoretical digests of all proteins in a database. A match is scored based on the number and accuracy of matching masses. PMF works best for single-protein spots from 2D gels but fails for complex mixtures where multiple proteins contribute overlapping peptide masses.
10.5 Bottom-Up & Top-Down Proteomics Workflows
Bottom-Up (Shotgun) Proteomics
The dominant strategy in proteomics, bottom-up analysis digests proteins into peptides before MS analysis. The canonical workflow involves: (1) protein extraction and quantification; (2) reduction of disulfide bonds (DTT or TCEP) and alkylation of cysteines (iodoacetamide or chloroacetamide); (3) enzymatic digestion with trypsin (sometimes complemented by Lys-C, Glu-C, or chymotrypsin for improved coverage); (4) peptide cleanup (C18 desalting, StageTips); (5) nano-LC-MS/MS analysis; and (6) computational peptide/protein identification.
Trypsin Specificity
Trypsin cleaves the peptide bond on the C-terminal side of arginine (R) and lysine (K) residues, except when followed by proline (P). This produces peptides with basic C-terminal residues that readily accept protons during ESI, yielding predominantly doubly charged ions in the $m/z$ 400-1200 range — ideal for CID/HCD fragmentation. The average tryptic peptide length is approximately 8-15 residues. Missed cleavages (where trypsin fails to cut at a K or R site) are common and must be allowed during database searching (typically 2 missed cleavages).
Top-Down Proteomics
Top-down proteomics analyzes intact proteins without prior digestion, preserving information about proteoforms — the specific combination of sequence variants, splice isoforms, and PTMs present on a single molecule. Intact proteins are separated by RP-LC, SEC, or capillary electrophoresis and introduced into the mass spectrometer via ESI. High-resolution analyzers (Orbitrap, FT-ICR) are essential for resolving the complex isotope envelopes of multiply charged protein ions. Fragmentation by ETD or ECD preserves labile PTMs and provides sequence tags for identification. Although technically demanding and lower throughput than bottom-up approaches, top-down proteomics uniquely reveals the combinatorial PTM landscape.
10.6 Database Searching & Statistical Validation
The raw output of a shotgun proteomics experiment is tens to hundreds of thousands of MS/MS spectra. Converting these spectra into peptide and protein identifications requires computational matching against a sequence database. The accuracy of this process depends on the search algorithm, the scoring function, and the statistical framework for controlling false discoveries.
Search Engines
| Software | Algorithm | Scoring | Key Features |
|---|---|---|---|
| SEQUEST | Cross-correlation | XCorr, $\Delta$Cn | Pioneer algorithm (Eng et al., 1994) |
| Mascot | Probability-based | Ion score ($-10\log_{10} P$) | Error-tolerant searching, PMF |
| MaxQuant/Andromeda | Probabilistic scoring | Andromeda score, PEP | Label-free & SILAC quantification, match between runs |
| MSFragger | Fragment ion indexing | Hyperscore | Ultra-fast open searching |
Mascot Ion Score
where $P$ is the probability that the observed match between the experimental spectrum and the database sequence is a random event. A higher score indicates a more significant match. The identity threshold depends on the database size and is typically defined at $P < 0.05$.
Target-Decoy FDR Control
The target-decoy approach, formalized by Elias and Gygi (2007), provides an empirical estimate of the false discovery rate. A decoy database is constructed by reversing or shuffling all protein sequences in the target database. MS/MS spectra are searched against the concatenated target-decoy database. Any high-scoring match to a decoy sequence is, by definition, a false positive. The FDR at a given score threshold is estimated as:
False Discovery Rate (Target-Decoy)
where $D$ is the number of decoy hits and $T$ is the number of target hits above the score threshold. The factor of 2 accounts for the fact that decoy matches estimate only half the false positives (some false hits match the target database by chance). A standard threshold of 1% FDR at both peptide-spectrum match (PSM) and protein levels is applied. More sophisticated approaches use posterior error probability (PEP) or Percolator (semi-supervised machine learning) to improve discrimination.
De Novo Sequencing
When the organism lacks a comprehensive protein database (e.g., non-model organisms, metaproteomics), de novo sequencing derives the amino acid sequence directly from the fragmentation spectrum without database reference. Algorithms such as PEAKS, Novor, and pNovo identify sequence tags by computing mass differences between consecutive fragment ions. Deep learning approaches (e.g., Casanovo, PointNovo) have substantially improved de novo accuracy. Sequence tags can then be used for homology-based searching with BLAST or error-tolerant database searching (sequence tag approach).
Chapter Summary: Key Concepts
- ◈Mass spectrometry measures $m/z$ of gas-phase ions; resolution and mass accuracy define the analytical power of the instrument.
- ◈ESI and MALDI are complementary soft ionization methods; ESI-LC-MS/MS dominates shotgun proteomics while MALDI-TOF excels for PMF and imaging.
- ◈Hybrid instruments (Q-TOF, Q-Orbitrap, tribrid) combine the best features of multiple analyzer types for comprehensive proteomic analysis.
- ◈CID/HCD produce b/y ion series for peptide sequencing; ETD produces c/z ions and preserves labile PTMs.
- ◈Target-decoy FDR control at 1% is the standard for ensuring reliable peptide and protein identifications in large-scale experiments.