Feature extraction is an integral part of understanding musical instrument signals. These signals contain a wealth of information and feature extraction is a method for obtaining specific characteristics through signal processing techniques. Hence, it is partly a process of reducing the overwhelming acoustical information and focusing on specific areas that may give clues for describing the signal under investigation. In a computer system, digital signal processing techniques are used for analysis. The techniques of data analysis are divided into frequency and time domain analyses. With these techniques, numerous approaches from different angles are employed to extract salient information, ultimately to help understand timbral characteristics.
Various
signal processing software systems exist for extracting specific acoustical
features. However, very few systems exist that are tailored for the purpose
of analysis and extraction of timbral qualities of musical instrument signals.
In this thesis I have developed and implemented various algorithms for
extracting salient features into one software application which can be
readily used by musicians, composers, engineers or anyone interested in
analyzing musical signals from a signal processing point of view.
It
is also interesting to note that although numerous signal processing algorithms
have been devised to accomplish feature extraction tasks, it is still unclear
as to which aspects of timbre are essential and which are less or more
meaningful than others. To my knowledge, there exists no theory nor rule
that unambiguously defines a hierarchical description of timbral features.
It is my hope that this software system will provide users the means to
explore, investigate and experiment with audio signals and help answer
some of the many questions regarding timbre that are yet to be discovered.
However, I also plan to continue research in timbre to encompass a recognition
module which would be able to take the extracted features and recognize
the sound source being analyzed.
The
software rendered in Java has been chosen for its platform independence
and graphical user interface (GUI) capabilities. The Java Swing GUI was
used to facilitate the interpretation of extracted features through graphical
displays and parametric controls of various signal processing coefficients.
Although the
Fourier transform has been known for quite some time, it was not widely
applied by the music community until after 1965, with the introduction
of the fast Fourier transform (Cooley and Tukey 1965). The advent of the
FFT stimulated research in music partly due to the cost effectiveness in
processing the discrete Fourier transform. One such area of research in
timbre was conducted using multidimensional scaling (MDS) methods (Grey
1976). The structure of musical signals was mapped to a three dimensional
timbre space. The listener determined the similarity or dissimilarity between
sounds when salient features were changed. The three dimensions incorporated
were brightness, spectral flux and attack time. Instead of natural sounds,
additive synthesis methods were employed for easy control of timbral parameters
in conducting the experiments. Noise content of musical signals on the
other hand has not been investigated in as much detail by researchers compared
to the "periodic" aspects of musical sounds (some work has been done in
modeling non-periodic signals by Serra 1997). However, voice coding research
has been adapting noise analysis techniques enthusiastically, where
speech is divided into a periodic and noisy part. The use of a LPC (Linear
Prediction Coding) method has been the primary backbone in current and
past speech analysis by synthesis (AbS) systems.
During the past decade a number of research topics in timbre have been pursued in the area of so called Computational Auditory Scene Analysis (CASA). It may be thought of as a research area in psychophysical disciplines to describe and explain how the listener perceives sounds. Sound, in this context may be referred to a multiplexed signal - an aggregate of a number of sound sources. The approach is to find the underlying reasons as to why we hear what we hear and not merely be content with the results of a computer system that finds a matching answer to a stimulus. The proliferation of CASA can be largely attributed to Bregman, who published his book Auditory Scene Analysis in 1990 (Bregman 1990). The book describes in detail highly intuitive and clever experiments that attempt to explain psychoacoustic phenomena and makes robust modeling of such features. However, as is the case with most if not all psychoacoustic experiments, the stimuli or test tones used in Bregram's book are also static, synthesized, sine-tones or simply impractical sound examples which are often remotely related to real-life sounds. Nevertheless, a significant and impressive amount of work has been done in this field. Work by Ellis (Ellis 1996) used a prediction-based model of the auditory system with good results in grouping sounds in noisy environments such as car horns, door slams and squeals in a "city street environment". He used a re-synthesis approach to assess its robustness and performance. Another is statistically based pattern-recognition approach (Martin 1999) where the "listening" system classifies musical instruments as one of 25 possibilities based on Ellis's PDCASA (Prediction Driven Computational Auditory Scene Analysis) architecture.