Model of effective signal processing in the auditory system
Model of effective signal processing in the auditory system
by Torsten Dau In recent years, a signal processing model has been developed in Oldenburg that can quantitatively describe a large number of experimentally determined performances of the human auditory system. The model contains both physiological knowledge about neuronal processing in the auditory system and physical principles of signal recognition. Of particular interest is the model's ability to reproduce the processing of signals that fluctuate in amplitude over time (modulated) in a way that is "aurally appropriate". All signals occurring in nature, especially speech, are characterised by such modulations. Based on the model, speech intelligibility in quiet and in noise as well as the speech transmission quality of coding systems (e.g. in mobile phones) can be successfully predicted.
A quantitative model of the effective signal processing in the auditory system
Over the last few years a signal processing model has been developed in Oldenburg that is able to quantitatively reproduce a wide range of data from experiments in human hearing. The model incorporates physiological aspects of neural processing in the auditory system as well as the physical principles involved in signal detection. The ability of the model to simulate the processing of temporally fluctuating (modulated) sounds is particularly interesting. Many naturally occurring sounds, in particular speech, are characterised by such fluctuations in level. Using the model, it is possible to successfully predict speech intelligibility in quiet and in noisy conditions, as well as the speech transmission quality of various coding systems, such as those used in mobile telephones.
Our hearing is impressively adapted to the detection of acoustic signals from the environment, in particular to the understanding of speech. Acoustic speech signals have very different frequency components and, above all, have the characteristic of changing strongly over time. In order to perceive speech, the auditory system therefore needs the ability to perceive the intensity of each frequency present in the sound at any given time. But how does it do this? What does it mean when we say that the auditory system "perceives" a sound or a certain sound characteristic? And above all: How can we model the processing of sound from the fluctuations in air pressure at the outer ear, through the conversion of these vibrations into neuronal excitation patterns, to perception at the end of the processing chain, without immediately "failing" at the complicated anatomical and physiological details of the individual stages? Our approach will be to approach the "system" in terms of physics and information technology and to try to model the signal processing in the auditory system using a series of functional elements. This is why we will be dealing with "effective" signal processing in hearing in this article.
What is psychoacoustics?
Basically, this type of modelling is located in the field of psychoacoustics, which describes the relationships between acoustic stimuli and the sensations they cause in humans. The specific measurement methods for recording the sensory variables can be very different, depending on whether it is, for example, the recognition (detection) of a signal in the presence of an interfering sound (masking), the identification of sound or the evaluation or scaling of a specific acoustic stimulus or stimulus attribute (such as the perceived loudness, roughness or tone colour). The quantitative functional relationship between stimulus and sensation that we are looking for ranges in the literature from simple descriptions to complex models of auditory signal processing, which include knowledge of the neuronal processing of sound in the nervous system as well as an evaluation stage (e.g. a "detector") at the end of this pre-processing.
In the following, we look at a model that was developed to describe the performance and limitations of the auditory system in so-called "differential" processing. For example, how well are we able to distinguish between the frequencies of two sounds or the pitches they produce? How well can we follow a sound in time? How good are the spectral and temporal resolution limits of our auditory perception? Even these fundamental aspects of hearing are crucial for later applications such as the development of modern hearing aids.
From the ear to the model
Let's start with the first, so-called "peripheral", processing stages such as the outer, middle and inner ear and then work our way up to the more "central" stages of the auditory pathway, which can be found in the brain stem and finally in the auditory cortex. The outer ear is primarily used for direction-dependent colouration (filtering) of the acoustic signal reaching the ear. This sound colouring, which varies depending on the direction of incidence, can already be used to locate sound sources. The sound is then transmitted through the middle ear to the fluid-filled inner ear. The middle ear is constructed in such a way that it enables an almost loss-free transfer of energy between the acoustic wave propagation in air (in the outer ear) and the wave propagation in the fluid-filled chambers in the inner ear. In the inner ear, the sound is broken down into different frequency components and a frequency-location transformation takes place: different frequencies are reproduced at different locations. This organising principle is called tonotopy and is also continued in the other "stations" of the auditory pathway when the stimulus is transmitted to the brain. From a physical point of view, the imaging in the inner ear corresponds to a filter bank: the sound is broken down spectrally into different band-pass filtered signals, so-called frequency groups. The mechanical vibrations in the individual frequency groups are then converted into nerve impulses by so-called hair cells. At low frequencies of sound, these nerve excitations can follow the exact course of the sound, whereas at high frequencies they can only do so with a certain degree of inertia. Physically, we can describe this transformation by a single envelope extraction, which can be realised by half-wave rectification and subsequent low-pass filtering. In the subsequent auditory nerve, the acoustic information is now encoded by increasing the neuronal activity of the various nerve fibres, so that the sound intensity is encoded for different frequencies at each point in time.
In addition, we find strongly non-linear, so-called adaptive effects in the response behaviour of auditory nerve fibres. In the auditory system, sudden changes in the sound, such as switching on and off processes, are evaluated more strongly neuronally than static, unchanging components in the signal. Such behaviour is typical for the processing of temporal information and can be seen at all stages along the auditory pathway up to the cortex. In physical terms, we can imagine such adaptive behaviour by connecting so-called post-adjustment loops with different "time constants" in series, in which the input signal is divided by the low-pass filtered output signal. This allows a certain adaptation to the mean value of the input signal, while fast changes are passed through unaffected. The response behaviour of the adaptation stage contained in the model is very similar to actually measured neuronal response patterns of auditory nerve fibres, so that e.g. signal onset and signal end are particularly emphasised. However, we cannot directly attribute these adaptation levels assumed in the model to individual local structures, as was the case with the first levels of the model.
The neuronal stimuli are transmitted from the auditory nerve to the brain stem, where complex functions are already being analysed. For example, an initial interaural comparison takes place here, in the so-called upper olive, i.e. an evaluation of the time and intensity differences occurring between the two ears to localise sound sources. Furthermore, modulation frequencies are analysed in the inferior colliculus, perhaps the most important "switching point" in the brain stem. Modulations describe the fluctuations in the temporal envelopes of signals. All communication signals that are important to us, such as speech and music, exhibit envelope fluctuations or modulations. It is therefore particularly interesting to understand how such modulations are mapped and transmitted in our brain. In the frequency range between 0 and around 10 Hz, modulations are perceived as fluctuations in volume. At frequencies between 10 and 80 Hz, a "rough" perception arises. At even higher modulation frequencies, more complex sound changes are perceived due to the simultaneous spectral colouration of the sound.
It has only recently become known that the time course of nerve excitation in the inferior colliculus is split into different modulation frequency ranges. Neurones are found here that are tuned to certain modulation frequencies, while they do not react at all to other modulation frequencies. In addition to the organising principle of tonotopy (frequency-location mapping) already formed in the inner ear, the principle of periodotopy is also evident at this higher level, i.e. different modulation frequencies are mapped at different locations. Interestingly, the two "axes" of frequency and modulation frequency are mapped independently of each other in the brain. Physically, this corresponds to a modulation filter bank, which breaks down the individual pre-processed signals into modulation frequency groups, so that a two-dimensional pattern (frequency x modulation frequency) results at the output of the previous processing stages in the model. This model stage is fundamental for the entire modelling of signal processing, as it enables a realistic simulation of many different acoustic phenomena in which the temporal aspects of hearing play a role.
In the model, an "internal noise" of the neuronal system is assumed at the output of the modulation filter bank, which represents the neuronal processing errors. The temporal pattern at the output of the pre-processing in the model represents the so-called internal representation of the original acoustic input signal. Such an internal representation is thus based on the modelling assumption that we can effectively reproduce the essential pre-processing steps of the auditory system with technical circuits. We therefore assume that a kind of image of the "state of the brain" is created in this way. Various auditory functions are based on this state. It is, so to speak, the input variable for the subsequent pattern recogniser (detector), which can be used to recognise or distinguish between different signals. The pattern recogniser is based on the idea that a change in the input signal becomes perceptible when the change in the corresponding internal representation of the signal is just large enough to stand out from the internal noise.
What are the benefits of the model?
Based on the current model, speech intelligibility in quiet and in background noise can already be predicted very well for normal and hard-of-hearing test subjects. In addition, the pre-processing in the model can be successfully used to predict the speech transmission quality of coding systems (which are used, for example, in mobile phones to reduce the amount of data to be transmitted) and for robust speech recognition in various background noises. Of course, our hearing has a number of properties that cannot yet be captured by the current model. For example, we know that there are many interconnections and interactions between the neuronal activities in the different frequency ranges in the brain. Such information processing across frequency groups plays an important role in perception in complex acoustic environments such as typical "cocktail party" situations. In the near future, we will be able to find out to what extent the model can also prove itself in the field of digital coding of audio signals and in hearing aid technology.
The author
Dr Torsten Dau (33), research assistant at the Department of Physics. Assistant at the Department of Physics, AG Medical Physics. Studied mechanical engineering 1987-1989 in Hanover (intermediate diploma). Studied physics 1987-1992 in Göttingen. Doctorate in physics 1996 in the graduate programme "Psychoacoustics" in Oldenburg. Research stays in Cambridge (England) in 1994 and 1996. Since 1996 member of the Collaborative Research Centre "Neurocognition". Honoured as a young scientist in the field of acoustics with the Lothar Cremer Prize in 1998. Research focus: Psycho-acoustics, digital signal processing, neuronal correlates of perceptual variables by means of acoustically evoked potentials (EEG).