Pitch in Praat

The analysis of periodic curves is best suited when working with continuous, stable curves. However, the sounds of speech are nothing if not unstable, which makes analysis more difficult. Praat overcomes this by assuming that speech is sufficiently stable when looking at small enough fragments of it, which are called ‘windows of analysis’.

This is the sound we’ll be working on: a complex sound wave with a fundamental frequency of 140Hz and a harmonic of 280Hz.

Three cycles of a complex sine wave, with a peak amplitude of 1 and
starting and ending at 0. The horizontal axis is 945 samples long — The sound in the analysis window

Each window is filtered to make sure there are no intensity peaks on the edges, which facilitates analysis. The filter used in this demonstration is a Hanning window (Boersma, 1993). According to the Praat documentation, a Hanning window is more responsive when working with 3 periods per analysis window, while a Gaussian window is better when working with a larger analysis window (the Gaussian window is twice as large as the Hanning window).

Praat indeed uses both these windows as default depending on the task and the degree of precision that is required.

A Hanning filter, tracing a bell curve asymptotically approaching 0 on
both ends — The Hanning filter function

We apply the filter by multiplying both curves.

A Hanning-filtered complex sine wave, with an amplitude close to 0 on
each end and 1 in the middle — The filtered window

To detect a sound’s pitch Praat uses autocorrelation, comparing each window with itself.

An autocorrelation plot shows the degree to which the compared curves are related on the Y-axis, and the time lag for each comparison on the X-axis. If the curve is periodic, then there should be a peak on the autocorrelation curve when the lag is equal to the original curve’s period.

The autocorrelation is highest at a time lag of 0, so we need to look for peaks that are greater than 0 for significant periodicity. However, in this case, since we are working with a complex sound wave with a loud harmonic, the autocorrelation curve shows a false peak (red line) before the time lag that we know is the sound’s actual fundamental frequency (blue line), which is alligned with a lower peak.

An autocorrelation function, showing a series of peaks with decreasing
size. A red line marks the top of the second (on sample 157), and a blue line
marks the top of the third (on sample 316), which is lower. The fourth peak,
roughly half-way through the curve, is the last 1 that is plain to see — Normalized autocorrelation of the filtered sound

In order to correct for this, we need to divide the filtered signal by the normalized autocorrelation curve of the windowing function.

Half of a bell curve, starting from 1 and curving downwards to the right,
asymptotically approaching 0 in the end — Normalized autocorrelation of the window function

The result is an estimate of the autocorrelation of the original signal, which is both robust and better suited for the analysis of speech signals than previous methods used. Note that this estimate gets increasingly unreliable after roughly half the length of the analysis window (Boersma, 1993).

Three cycles of a sinusoid wave, half the length of the ones shown
before, starting on a peak at a magnitude of 1. The second peak is slightly
lower than the first. The third peak, which is slightly higher than the
second, is marked with a blue line on sample 316 — Estimated autocorrelation of the original signal

And by finding the maximum at a time lag > 0 in this estimated curve, we can calculate the pitch of the original signal converting from samples to Hz.

\[f_0 = \frac{1}{lag_{max} / f_s}\]

References

Boersma, P. (1993) Acurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. IFA Proceedings 17: 97-110