next up previous
Next: Perceptual Representations of Sound Up: Multi-Model Estimation and Classification Previous: Introduction


With the development of Fourier analysis came the notion of spectral analysis of sound. Helmholtz, [8], was the first person to describe musical timbre as a property of the spectral components of the sound. Helmholtz view of timbre was that the perceptual cues came from the Fourier series coefficients. For example, it is well known that a clarinet's Fourier series comprises only the odd components of the spectrum. The spectral envelope of a sound, the profile of the Fourier series, is still considered to be an important attribute of timbre.\

In the 1960s, computer analysis of sound gave rise to important new discoveries regarding timbre. The work of Risset and Matthews [6], and the little-known work of James Tenney at AT&T Bell Labs [7], showed that a major component of timbre had not previously been considered; that was the temporal evolution of the spectral envelope. This temporal complexity was reflected in new studies in the analysis and synthesis of sound, based on frequency 'tracks' and their temporal evolution.

How could an elegant theory of timbre be developed when there were so many degrees of freedom of the underlying spectral representation of sound? The seminal work of Grey in using perceptual experiments to recover the geometry of perceptual sound space without regard to explicit parameterizations gave rise to a new interest in timbre.\

Grey interpreted the axes of his timbre space as having a number of psychophysical properties. The first axis had a psychophysical correlate in the spectral energy distribution of the sound. The second axis was found to be related to the temporal behaviour of groups of upper harmonics in the sound, especially in the first 100ms. The third axis was related to the degree of spectral fluctuation in the sound, which indicates a strong dependence on the articulatory nature of the instrument. He also found that the instruments tended to group into families based on the underlying physics of the instrument. This indicates that the notion of physical similarity of the underlying sound-generating system of a sound may be one of the principal features of timbre, along with spectral energy distribution and synchronicity of the upper harmonics.\

Grey found that there were exceptions to the clustering by instrument family. He hypothesized that these exceptions were accounted for by the method of articulation of the physical system. For example, the sound of an overblown flute has articulatory dynamics similar to that of the string family of sounds: a) it has low-amplitude, high-frequency inharmonic energy, b) there is synchronicity of the upper harmonics in the attack portion of the sound and c) there is a high degree of spectral fluctuation during the sustain portion of the sound due to the chaotic nature of flute jet streams and string/bow couplings. This indicates that the articulatory dynamics of sound-generating systems may also be a major constituent of the timbre percept.

Since these studies, little or no new work on classification and characterization of sound has evolved. Our goal is to provide the framework for extending the notion of timbre to account for many of the findings in experimental literature, such as the use of physical models of sound and articulatory estimates of coarse physical parameters. Whereas the first two components of Grey's theory are psychophysical in nature, we propose that the remaining components of his theory are cognitive in nature and aim to show how they can be acquired via learning.\

A theory of timbre must be constrained by issues of perceptual plausiblility; we propose some goals for our research: a) The representation should be based on physical models of sound with instruments of the same family being generated from the same model. b) Parameters should be estimated in order to represent the articulatory aspect of timbre perception. This corresponds to the motor theory of speech in which the estimation of articulators is used for recognition. c) The models should be implicit and the articulatory estimates should be measured on a non-metric scale relative to the extremes of the dynamics of the articulator. We present results from computational models that implement this theory.

next up previous
Next: Perceptual Representations of Sound Up: Multi-Model Estimation and Classification Previous: Introduction

Michael Casey
Fri Mar 22 15:49:22 EST 1996