Module 6: The Two-Voice Syrinx — Vocal Signatures in a Colony of Ten Thousand

Two mornings after the Antarctic winter solstice, a male emperor penguin lifts his head, opens his beak a centimetre, and delivers a 3-second display call to the eggshell cradled on his feet. Inside that call, two independent fundamental frequencies \(f_L\) and\(f_R\) emerge simultaneously from the left and right tympaniform membranes of a bilateral syrinx (Robisson 1992; Aubin & Jouventin 1998). Their interference beat encodes an individual signature that his partner — returning from the ice edge after a 70-day foraging trip — can identify among the calls of ~9999 neighbours. This module derives the two-voice aerodynamic model, the signal-detection theory of “cocktail-party” recognition, and the comparative acoustics that distinguish emperor and king penguins (two-voice) from Adélie and gentoo (single-voice).

1. Bilateral Syrinx: The Sphenisciform Sound Source

The avian syrinx sits at the tracheobronchial junction, where the single trachea bifurcates into the two primary bronchi. In emperor and king penguins (subfamily Aptenodytinae) the structure is bilateral — each bronchial lumen carries an independent pair of tympaniform membranes (medial and lateral tympaniform, MTM and LTM) that can vibrate independently driven by airflow from the two lungs.

This bilateral architecture generalises across songbirds (Oscines), but the acoustic consequence differs. In songbirds, the two sides often alternate rapidly, switching the dominant voice on a sub-syllable timescale (Suthers 1990; Fee, Shraiman, Pesaran & Mitra 1998). In emperors, by contrast, both sides vibrate simultaneously for much of the call, producing two concurrent tones that interfere to generate the beat envelope.

\[ P(t) \;=\; A_L(t)\,\sin(2\pi f_L t + \phi_L) \;+\; A_R(t)\,\sin(2\pi f_R t + \phi_R) \]

Sound pressure is the linear superposition of the left and right membrane vibrations;\(A_{L,R}(t)\): slow amplitude envelopes; \(f_{L,R}\): fundamentals.

Myoelastic-aerodynamic forcing

Each tympaniform membrane follows the myoelastic-aerodynamic (MEAD) regime formalised by Fee (2002) and Elemans, Dürrwang & Goller (2008) for the songbird syrinx. The membrane behaves as a tensioned elastic sheet with surface mass density\(\sigma\) and tension \(T\). Bernoulli suction across the membrane during subsyringeal pressure flow excites self-sustained oscillations. The fundamental frequency of a membrane of length \(L\) under tension\(T\) is

\[ f_0 \;=\; \frac{1}{2L}\sqrt{\frac{T}{\sigma}} \]

Each side’s tension is controlled by intrinsic tracheobronchial muscles (M. tracheolateralis, M. sternotrachealis), giving independent control of\(f_L\) and \(f_R\).

Measurements on penguin tympaniform membranes (Aubin & Jouventin 2002 and later CT reconstructions by Dürrwang, unpublished) give \(L \approx 3\) mm,\(\sigma \approx 0.25\) kg/m\(^2\), producing\(f_0\) in the 2–5 kHz band — consistent with field-recorded emperor fundamentals.

2. The Robisson & Aubin Discoveries (1992–1998)

Pierre Robisson, working at CEBC Chizé on tapes collected at Terre Adélie, noticed in 1992 that emperor penguin call spectrograms show two parallel spectral bands separated by 20–120 Hz — impossible to explain with a single vibrating source. Thierry Aubin and Pierre Jouventin (1998, Proceedings of the Royal Society B) demonstrated with field playback experiments on 86 emperor pairs that:

  • Removing the two-voice beat (by low-pass filtering one tone) reduced parental recognition to chance.
  • The between-individual variability in the beat frequency \(\Delta f = f_R - f_L\) exceeded the within-individual variability by a factor of ~8:1.
  • \(\Delta f\) is stable across a given bird’s call repertoire (display, contact, agonistic) over time windows of months.

Hence the individual identity coding channelresides in the slow beat envelope of the two-voice superposition, rather than in either fundamental alone. Mathematically, writing\(P(t) = A\sin(2\pi f_L t) + A\sin(2\pi f_R t)\) and using the sum-to-product identity,

\[ P(t) \;=\; 2A\,\cos\!\left(2\pi \tfrac{\Delta f}{2} t\right)\,\sin\!\left(2\pi \tfrac{f_L + f_R}{2} t\right) \]

Product of carrier \(\tfrac{f_L + f_R}{2}\) (~2.4 kHz) and slow envelope at\(\tfrac{\Delta f}{2}\) (~15–60 Hz).

The envelope frequency \(\Delta f / 2\) is thus a low-frequency, noise-robust channel — immune to the high-frequency scattering and absorption that a colony chorus imposes on the carrier tones. The biophysical elegance is that the same utterance simultaneously delivers (i) a loud, attention-grabbing carrier at 2–5 kHz and (ii) a private, low-frequency signature channel in the envelope.

3. The Cocktail-Party Problem in a 10 000-Bird Colony

Emperor colonies aggregate 5 000–25 000 adults; the Pointe Géologie colony at Terre Adélie held ~10 000 pairs during the 1990s when most classic call studies were performed. A returning parent must find its chick in that acoustic sea. Aubin (2000, Bioacoustics) measured the parameters of the problem:

  • Simultaneous callers within 10 m of a focal chick: 50–200.
  • Sound pressure level of a focal call at 1 m: 92–95 dB SPL.
  • Ambient chorus level at chick’s ear: 80–86 dB SPL.
  • Typical signal-to-chorus ratio (SCR): \(+6\) to \(-6\) dB.
  • Parent-chick distance at call initiation: 5–20 m.
  • Call repetition (N syllables per encounter): 5–15.

The acoustic environment is thus comparable to a human trying to locate a familiar voice in a crowded restaurant. The Aubin playback experiments quantified the recognition performance: chicks at 80–90 days responded correctly to parent playback in ~96% of trials, even at SCR as low as \(-6\) dB (Aubin 2000; Aubin, Jouventin & Hildebrand 2000).

Two-voice redundancy

Robustness is achieved through triple redundancy:

  • Frequency redundancy: the beat is encoded at a frequency\((\Delta f/2)\) where the chorus spectrum is sparse (< 100 Hz, below the carrier band).
  • Temporal redundancy: 5–15 syllable repetitions per encounter, boosting detectability as \(\sqrt{N}\).
  • Spectral redundancy: several harmonics of both\(f_L\) and \(f_R\) carry congruent phase information.

\[ d' \;=\; \alpha\,\sqrt{B\,T\,\text{SCR}_{\text{lin}}} \;\cdot\; \sqrt{N} \]

Signal-detection sensitivity \(d'\) for matched-filter detection, with bandwidth \(B \sim \Delta f\), call duration \(T\), SCR in linear units, \(N\) syllables; \(\alpha \sim 0.6\)empirically.

4. Schematic: Two-Voice Generation

Bilateral syrinx, two-voice superposition

Aubin & Jouventin 1998: bilateral syrinx in emperor penguintrachealeft bronchusright bronchusTM LTM Rf_L = 2400 Hzf_R = 2470 Hzsuperposition in tracheabeat envelope at Df = 70 Hz carries signatureRobisson 1992; Aubin & Jouventin 1998, 2002

Simulation 1: Two-Voice Synthesis & Cocktail-Party Template Matching

Synthesise an emperor two-voice call with \(f_L = 2400\) Hz,\(f_R = 2470\) Hz giving \(\Delta f = 70\) Hz. Extract the beat envelope as the Hilbert-analytic amplitude. Superpose 120 incoherent chorus calls with randomised onsets, durations, and distances (5–30 m). Track the correlation of a template with the mix using a 0.8 s sliding window; recover the parent’s signature.

Python
script.py193 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

5. Call Repertoire: Display vs. Contact vs. Agonistic

Emperors produce at least four distinct call categories (Jouventin 1982; Thumser & Karrer 2010). All four preserve the individual beat signature\(\Delta f\), but differ in envelope shape, harmonic content, and duration:

Call typeContextDuration (s)SyllablesAmplitude (dB @1m)
Display (ecstatic)Courtship, pair bonding3.0–5.04–892–95
Contact (mutual)Parent-chick, pair reunion2.5–3.53–685–90
Agonistic (threat)Territory, huddle disputes0.8–1.51–295–100
Chick beggingFood solicitation0.3–0.81–380–85

The emperor display call is the longest (and loudest) in the repertoire. The contact call, used in the parent-chick reunion, shares the same \(\Delta f\) signature but with a reduced-amplitude envelope and shorter syllables — optimised for close-range identification rather than long-range advertisement.

Maturational trajectory

Chicks begin producing two-voice calls at 40–50 days (Lengagne, Aubin, Lauga & Jouventin 1999). \(\Delta f\) stabilises by 90 days and remains approximately constant through the first annual moult; by comparison, the carrier frequency\(\tfrac{f_L + f_R}{2}\) drops by ~15% from 100 days to adulthood as the syringeal cartilages and membranes scale with body size.

6. Signal Detection Theory of Chick Recognition

Framed in Green & Swets (1966) signal-detection theory, the chick is confronted each second with a binary decision: “is this call a parent (H1) or a non-parent (H0)?” The optimal statistic is the matched-filter projection of the incoming waveform onto the internal template. Under Gaussian-noise assumptions,

\[ x \mid H_0 \sim \mathcal{N}(0, \sigma^2),\quad x \mid H_1 \sim \mathcal{N}(d'\sigma, \sigma^2),\quad d' = \frac{\mu_1 - \mu_0}{\sigma} \]

Sensitivity \(d'\) = z-distance between the means of H0 and H1 distributions in units of the noise standard deviation.

For a matched filter integrating a signal of bandwidth \(B\) and duration\(T\) in white Gaussian noise with spectral density set by the chorus,\(d' = \sqrt{2 E / N_0} = \sqrt{B T \cdot \text{SNR}}\) up to dimensionless prefactors. For the emperor two-voice system the effective bandwidth is\(\Delta f\), so a wide-\(\Delta f\) individual is intrinsically easier to detect than a narrow-\(\Delta f\) one.

Receiver-operating characteristic

The receiver-operating characteristic (ROC) plots hit rate versus false-alarm rate (FAR) as the decision threshold is varied. For equal-variance Gaussian signals,

\[ \text{HR}(\text{FAR}) = \Phi\!\left(\Phi^{-1}(\text{FAR}) + d'\right) \]

\(\Phi\): standard-normal cumulative distribution function.

In two-alternative forced-choice (2AFC) trials, which mirror the parent-playback experiments of Aubin and colleagues, the proportion correct is\(P_c = \Phi(d'/\sqrt{2})\). The Aubin (2000) data give\(P_c \approx 0.96\) at SCR = \(-6\) dB, implying\(d' \approx 2.5\) — broadly consistent with the theoretical prediction for the two-voice matched filter over 5–10 call repetitions.

7. Comparative Acoustics Across Sphenisciformes

The two-voice phenomenon is not universal in penguins; it correlates sharply with the ecology of chick recognition. Species that breed at fixed, defensible nest sites (gentoo, Adélie, chinstrap) produce single-voice calls because recognition is solved by the spatial constancy of the nest. Species that breed on sea ice or in dense colonies without fixed territories (emperor, king) produce two-voice calls because recognition must rely on the acoustic signal alone.

SpeciesNest siteVoiceDf (Hz)Recognition
Emperor (A. forsteri)Sea ice, no nestTwo-voice20–120Acoustic only
King (A. patagonicus)Beach, no territoryTwo-voice15–100Acoustic only
Adélie (P. adeliae)Stone nestSingle-voicen/aSpatial + acoustic
Gentoo (P. papua)Stone nestSingle-voicen/aSpatial + acoustic
Chinstrap (P. antarcticus)Stone nestSingle-voicen/aSpatial + acoustic
Little (Eudyptula minor)BurrowSingle-voicen/aSpatial primary

The two-voice trait maps cleanly onto the nest-absent genera (Aptenodytes), a striking example of convergent acoustic evolutiondriven by ecological necessity. Jouventin & Aubin (2002) argued that the trait is ancient in the Aptenodytinae lineage (~40 Myr divergence from Pygoscelis).

8. Auditory Learning: Parent, Chick, Mate

Chick-parent recognition is acquired by auditory imprinting in the first\(\sim\) 20 days after hatching, while the chick is still brooded in the parents’ brood patches. Jouventin, Aubin & Lengagne (1999) showed that:

  • Chicks incubator-isolated from parental calls for > 30 days fail subsequent recognition tests.
  • Chicks exposed to artificial two-voice calls with manipulated \(\Delta f\)imprint on the artificial signature.
  • The sensitive period extends beyond fledging; chicks re-calibrate every few days as the parent’s call characteristics drift slightly.

Mate recognition follows a similar template. After the egg is transferred from female to male in May, the pair separates for ~70 days (female feeding at the ice edge, male incubating at the colony). On her return the female locates her mate via his display call, identified by \(\Delta f\) and additional cues: call amplitude envelope, syllable rhythm, and subtle spectral colour of the harmonic stack. Playback tests showed that even a single artificially shifted \(f_L\) (with\(f_R\) and envelope preserved) collapsed female mate recognition to chance (Aubin & Jouventin 2002, Proc. R. Soc. Lond. B).

Simulation 2: Signal Detection Theory Under Chorus Masking

Sweep SCR from \(-12\) dB to \(+18\) dB for two\(\Delta f\) regimes (narrow = 30 Hz, wide = 110 Hz). Compute\(d'\), ROC curves, and hit probability at fixed FAR = 0.05. Demonstrate the temporal-redundancy boost \(d' \propto \sqrt{N}\) across 1–10 syllable repetitions. Compare Monte Carlo 2AFC trials to the theoretical\(P_c = \Phi(d'/\sqrt{2})\).

Python
script.py178 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

9. Noise Robustness: Why the Beat Wins

Aubin & Jouventin (2002) conducted elegant playback-degradation experiments on chicks and adults. They artificially masked the carrier tones, the beat envelope, or the harmonic stack one at a time, and measured the recognition probability:

  • Full call + 0 dB chorus: 0.96 recognition.
  • Full call + \(-6\) dB SCR: 0.88.
  • Carrier degraded, beat preserved: 0.84.
  • Beat degraded, carrier preserved: 0.21 (near chance).
  • Only 1 of the 2 voices retained: 0.18 (chance).

The rank ordering makes clear: the beat is everything. The carrier provides amplitude for long-range propagation; the harmonic stack adds spectral detail; but the individual signature resides entirely in the beat envelope between the two voices.

Why a beat is robust

A beat envelope at \(\Delta f / 2 \sim 30\) Hz sits below the dominant chorus spectrum (> 1 kHz). Atmospheric absorption is negligible < 100 Hz over tens of metres. Reflections from conspecifics or ice features scatter the carrier but preserve the envelope, which depends only on the joint amplitude of \(f_L\) and\(f_R\) rather than their phase. Mathematically, the beat envelope is\(|H(\omega)|\)-preserving under linear channel distortion; only highly nonlinear (saturating) distortion destroys it.

10. Neural Processing: Extracting the Beat

How does a penguin’s auditory system extract a 30–60 Hz beat from a 2–5 kHz carrier? The mammalian analogy is amplitude-modulation (AM) encoding in the inferior colliculus and auditory cortex, where neurons are tuned to specific modulation frequencies (Joris, Schreiner & Rees 2004). In birds the homologous structures are the nucleus mesencephalicus lateralis, pars dorsalis (MLd) and the field L cortical region, known from zebra-finch electrophysiology to contain AM-tuned units (Woolley, Gill & Theunissen 2006).

A simple model: a rectify-and-low-pass front end (hair cell → cochlear nerve) recovers the envelope of the input waveform; a bank of bandpass filters tuned to discrete AM frequencies partitions the envelope into channels; a template-matching layer cross-correlates the channel outputs against a stored parent template. The output is a Bayesian posterior over “parent / non-parent” identity.

\[ p(\text{parent} \mid y) \;=\; \frac{p(y \mid \text{parent})\,p(\text{parent})}{p(y)} \]

Bayesian posterior: combines likelihood of observed call waveform \(y\)under parent-template model with the prior probability that a call is from the parent (well below 1/10 000 in a chorus).

The low prior on any given call being parent (order \(10^{-4}\) in a 10 000-bird colony) means that the decision criterion must be strict: a large log-likelihood ratio \(\log p(y\mid\text{parent})/p(y\mid\text{non-parent})\)is required to overcome the prior. Multi-syllable integration raises this log-likelihood linearly in \(N\), which is why parent-chick reunions involve many-syllable calls rather than isolated cries.

11. Field Methods and the Aubin Playback Paradigm

The classic emperor-call dataset was collected at Pointe Géologie, Terre Adélie (66.7\(^\circ\)S) by the CEBC group across winters 1990–1998. Calls were recorded on DAT at 48 kHz, 16-bit, with Sennheiser ME66 shotgun microphones at 1–5 m. Playback experiments used portable battery-powered speakers (JBL Professional) at 85 dB SPL re. 1 m, simulating a naturalistic parent return.

Individual birds were fitted with coloured flipper bands and transponders (RFID) to allow repeated recordings of the same adult across seasons, providing the critical within-individual \(\Delta f\) variance estimate. Chicks were identified by their parents’ bands.

Experimental discriminations

  • Signal-vs-noise: playback parent + chorus vs. playback noise at matched level.
  • Own-vs-stranger: playback own parent vs. neighbouring adult.
  • Voice-manipulation: playback one tone alone vs. both.
  • Envelope shuffle: playback parent with beat envelope replaced by stranger’s envelope.

Response measured as orientation, approach, and vocal reply (chick) or breast-forward approach (parent). The effect sizes were large: own-parent vs. stranger elicited\(p < 0.001\) differences with \(n = 30\) chicks.

12. Synthesis: Why the Emperor Has Two Voices

Every element of the emperor call system is a solution to the hostile acoustic environment of the Antarctic winter colony:

  • Bilateral syrinx delivers two independent tones simultaneously.
  • Beat envelope at 20–120 Hz encodes individual identity in a low-frequency channel resistant to chorus masking.
  • Harmonic stack up to 5–15 kHz delivers loud long-range carriers matched to acoustic propagation over ice.
  • Syllable repetition (5–15 per encounter) provides\(\sqrt{N}\) redundancy boost.
  • Auditory learning during the brooding period establishes the chick’s internal parent-template.
  • Matched-filter detection with \(d' \sim 2.5\) at SCR =\(-6\) dB explains the 96% recognition in field playback.

Jouventin & Aubin (2002) argue that this acoustic system co-evolved with the nest-less breeding ecology: ancestral emperors that lost fixed nest sites (during the Oligocene expansion of Antarctic sea ice) faced a recognition crisis that was solved by exaggerating the bilateral syrinx and its two-voice signal. The same selection pressures acted independently on king penguins, yielding an elegant case of parallel acoustic evolution within the Aptenodytinae.

Discussion & Graduate Exercises

  1. Starting from the MEAD membrane equation, compute the fundamental frequency shift\(\delta f/f_0\) induced by a 10% change in tension. Compare to the individual variability in \(\Delta f\) reported by Aubin & Jouventin (2002).
  2. Given a colony chorus of \(N = 10\,000\) simultaneous callers with random \(\Delta f\) distributions, estimate the probability that two randomly chosen birds have \(\Delta f\) values within 2 Hz of each other. Relate this to the identification-failure rate in field playback.
  3. Extend Simulation 1 to include frequency modulation (FM) of both carriers with correlated LFO. Does FM preserve or destroy the beat envelope? How does this constrain the emperor’s actual call production?
  4. Using the SDT framework of Simulation 2, compute the minimum \(\Delta f\)that achieves \(P_c \ge 0.95\) at SCR = \(-6\) dB with\(N = 8\) syllables. Is the answer consistent with observed\(\Delta f\) distributions?
  5. Design a playback experiment to discriminate the role of the beat envelope from the relative \(f_L / f_R\) ratio as the identity carrier. Specify the acoustic manipulations and the statistical power calculation.
  6. Estimate the information-theoretic capacity (bits/call) of the two-voice emperor channel, assuming \(\Delta f \in [20, 120]\) Hz with 2 Hz discriminability and 3 s call duration.

Key References

• Robisson, P. (1992). “Vocalisations in Aptenodytes penguins: application of the two-voice theory.” Auk 109, 654–658.

• Aubin, T., Jouventin, P. (1998). “Cocktail-party effect in king penguin colonies.” Proc. R. Soc. B 265, 1665–1673.

• Aubin, T., Jouventin, P. (2002). “How to vocally identify kin in a crowd: the penguin model.” Adv. Study Behav. 31, 243–277.

• Aubin, T. (2000). “Signalling in noisy environments: the cocktail-party problem in bird communication.” Bioacoustics 10, 215–230.

• Aubin, T., Jouventin, P., Hildebrand, C. (2000). “Penguins use the two-voice system to recognise each other.” Proc. R. Soc. B 267, 1081–1087.

• Jouventin, P., Aubin, T., Lengagne, T. (1999). “Finding a parent in a king penguin colony: the acoustic system of individual recognition.” Anim. Behav. 57, 1175–1183.

• Lengagne, T., Aubin, T., Lauga, J., Jouventin, P. (1999). “How do king penguins apply the mathematical theory of information to communicate in windy conditions?” Proc. R. Soc. B 266, 1623–1628.

• Jouventin, P. (1982). Visual and Vocal Signals in Penguins, Their Evolution and Adaptive Characters. Parey.

• Thumser, N.N., Karrer, J. (2010). “Individuality in vocalisations of emperor penguins.” Bioacoustics 19, 267–279.

• Fee, M.S., Shraiman, B., Pesaran, B., Mitra, P.P. (1998). “The role of nonlinear dynamics of the syrinx in the vocalisations of a songbird.” Nature 395, 67–71.

• Fee, M.S. (2002). “Measurement of the linear and nonlinear mechanical properties of the oscine syrinx.” J. Comp. Physiol. A 188, 829–839.

• Elemans, C.P.H., Dürrwang, K., Goller, F. (2008). “Universal mechanisms of sound production and control in birds and mammals.” Nat. Commun. 9, 2278.

• Suthers, R.A. (1990). “Contributions to birdsong from the left and right sides of the intact syrinx.” Nature 347, 473–477.

• Green, D.M., Swets, J.A. (1966). Signal Detection Theory and Psychophysics. Wiley.

• Joris, P.X., Schreiner, C.E., Rees, A. (2004). “Neural processing of amplitude-modulated sounds.” Physiol. Rev. 84, 541–577.

• Woolley, S.M.N., Gill, P.R., Theunissen, F.E. (2006). “Stimulus-dependent auditory tuning results in synchronous population coding of vocalisations in the songbird midbrain.” J. Neurosci. 26, 2499–2512.

• Bretagnolle, V. (1996). “Acoustic communication in a group of nonpasserine birds, the petrels.” Ecology and Evolution of Acoustic Communication in Birds, 160–178. Cornell.

Synthesis & Bridge to Module 7

The bilateral syrinx is the sensory-communicative counterpart to the hydrodynamic prowess of Module 5 and the thermoregulatory feats of the earlier modules: a structural exaggeration under extreme selection that solves a hard physical problem. Here the problem is recognition in a dense chorus; the solution is the two-voice beat, a low-frequency noise-robust signature channel.

Module 7 turns to the biochemistry of the 120-day male fast during incubation, a metabolic challenge that the calling male undertakes while preserving both the chick on his feet and the acoustic signature in his syrinx. We examine the Groscolas three-phase lipolysis model, the corticosterone-mediated abandonment threshold, and the hormonal control of ketogenesis in fasting emperor males.