The analysis of periodic curves is best suited when working with continuous, stable curves. However, the sounds of speech are nothing if not unstable, which makes analysis more difficult. Praat overcomes this by assuming that speech is sufficiently stable when looking at small enough fragments of it, which are called ‘windows of analysis’.

This is the sound we’ll be working on: a complex sound wave with a fundamental frequency of 140Hz and a harmonic of 280Hz.

The sound in the analysis window
Three cycles of a complex sine wave, with a peak amplitude of 1 and
  starting and ending at 0. The horizontal axis is 945 samples long

Each window is filtered to make sure there are no intensity peaks on the edges, which facilitates analysis. The filter used in this demonstration is a Hanning window (Boersma, 1993). According to the Praat documentation, a Hanning window is more responsive when working with 3 periods per analysis window, while a Gaussian window is better when working with a larger analysis window (the Gaussian window is twice as large as the Hanning window).

Praat indeed uses both these windows as default depending on the task and the degree of precision that is required.

The Hanning filter function
A Hanning filter, tracing a bell curve asymptotically approaching 0 on
  both ends

We apply the filter by multiplying both curves.

The filtered window
A Hanning-filtered complex sine wave, with an amplitude close to 0 on
  each end and 1 in the middle

To detect a sound’s pitch Praat uses autocorrelation, comparing each window with itself.

An autocorrelation plot shows the degree to which the compared curves are related on the Y-axis, and the time lag for each comparison on the X-axis. If the curve is periodic, then there should be a peak on the autocorrelation curve when the lag is equal to the original curve’s period.

The autocorrelation is highest at a time lag of 0, so we need to look for peaks that are greater than 0 for significant periodicity. However, in this case, since we are working with a complex sound wave with a loud harmonic, the autocorrelation curve shows a false peak (red line) before the time lag that we know is the sound’s actual fundamental frequency (blue line), which is alligned with a lower peak.

Normalized autocorrelation of the filtered sound
An autocorrelation function, showing a series of peaks with decreasing
  size. A red line marks the top of the second (on sample 157), and a blue line
  marks the top of the third (on sample 316), which is lower. The fourth peak,
  roughly half-way through the curve, is the last 1 that is plain to see

In order to correct for this, we need to divide the filtered signal by the normalized autocorrelation curve of the windowing function.

Normalized autocorrelation of the window function
Half of a bell curve, starting from 1 and curving downwards to the right,
  asymptotically approaching 0 in the end

The result is an estimate of the autocorrelation of the original signal, which is both robust and better suited for the analysis of speech signals than previous methods used. Note that this estimate gets increasingly unreliable after roughly half the length of the analysis window (Boersma, 1993).

Estimated autocorrelation of the original signal
Three cycles of a sinusoid wave, half the length of the ones shown
  before, starting on a peak at a magnitude of 1. The second peak is slightly
  lower than the first. The third peak, which is slightly higher than the
  second, is marked with a blue line on sample 316

And by finding the maximum at a time lag > 0 in this estimated curve, we can calculate the pitch of the original signal converting from samples to Hz.


References