Object Perception
Object Perception
Object perception
Klump (coordinating), Colonius, van Dijk, Feudel, Kollmeier, Stavenga/van Netten, Verhey
State of the art and own previous work
Natural stimulus ensembles are usually complex and their analysis is a demanding task that our neural system, in comparison to computers, solves with ease. Our visual system is capable of the identification and separate analysis of different objects in the optical image that is projected onto the retina. It uses rules that form the nucleus of the "Gestalt Theory". Similar to the visual analysis, our auditory system is capable of separating objects (i.e., sound sources) in the acoustic scene. Bregman (1990) applied the principles of the "Gestalt Theory" to auditory processing and demonstrated that similar rules can be used to explain the performance in the visual and auditory system. Griffiths and Warren (2004) have proposed a framework for models of auditory scene analysis. The neural basis of the mechanisms underlying auditory scene analysis, however, is not well understood. It was a major topic of research in the trans-regional collaborative research center "The Active Auditory System" (SFB/TRR 31) based at the Universities of Oldenburg and Magdeburg and the Leibniz Institute for Neurobiology at Magdeburg. The dissertation projects described below complemented the projects studying the mechanisms of auditory object formation within the SFB/TRR 31.
In comparison to the analysis of visual scenes, auditory scene analysis poses additional problems that need to be solved by the neural system. While spatial relations between objects in a visual scene are preserved by the physics of the projection mechanism in our eyes, in auditory scene analysis these spatial relations are computed by the neural system through analyzing the differences in the spectrum and the temporal pattern of the sounds reaching the two ears. These computations can be utilized to separate auditory objects in space (e.g. Darwin & Hukin 1998). Jeffress (1948) proposed a now classical model to explain the processing of the interaural-time difference (ITD) that is used by the auditory system to compute source location in azimuth. This model assumes, that for every frequency and ITD the auditory system has coincidence-detector neurons that fire maximally, if the ITD is exactly compensated by the difference in the neural delay of the action potentials generated by the input at each ear. The Jeffress model has been validated by physiological data from the barn owl (see review by Konishi 2003). Temporal processing can also explain the physiological and psychophysical results in other bird species (e.g., see review by Klump 2000). In the past years, however, there is accumulating evidence that the Jeffress model may not adequately describe the processing of ITDs in mammals and possibly also in humans (Grothe 2003, McAlpine & Grothe 2003, Palmer 2004). Brand et al. (2002) found, that in the gerbil medial superior olive (MSO) the best firing rate of binaural neurons is elicited by ITDs that are mostly larger than the ITD provided by the acoustics based on the dimensions of the head. Thus, it was proposed that mammals compute the azimuth from ITDs by comparing the firing rate of the left and right MSO and this difference represents sound direction more accurately, if the firing rate of the neurons changes strongly with the azimuth angle of the sound source (for a theoretical discussion see Harper & McAlpine 2004). Within an ongoing dissertation project of the InterGK, we study the azimuth sound localization in gerbils behaviorally (Maier & Klump submitted) and in collaboration with Dr. David McAlpine (who joins the InterGK as a Humboldt laureat) in a neurophysiological study in the IC of the gerbil that is combined with animal psychophysics. These studies allowing a direct comparison of physiological and psychophysical data in the same species will substantially improve our understanding of processing of ITDs in sound localization by mammals. Intensity differences between the two ears (ILDs) that will change both with azimuth and elevation of a sound source provide an additional cue that is mainly useful at frequencies at which the head provides a sufficient sound shadow. The interaction of ITD and ILD will determine the perceived position of the sound source in space (for a review see Yin 2002). This interaction of ILD and ITD cues has been studied with respect to the mechanisms of sound localization in humans within the InterGK by the group of Kollmeier (e.g. Riedel & Kollmeier 2003). This group was also interested in the mechanisms that allows the human auditory system to suppress the effect of the echoes trailing the direct sound in computing the direction of the sound source (Damaschke et al. 2005). This effect, called the precedence effect, was a topic of further studies within the last grant period.
Furthermore, temporal patterns play an important role in mechanisms of auditory object formation and scene analysis that do not rely on interaural differences of the signal. The dimension of time is inherent in the generation of all acoustic signals because they are comprised of pressure changes with ongoing time that occur over micro- or milliseconds (see review by Grothe & Klump 2000). Sounds from sources that are simultaneously active can be segregated on the basis of their spectral content, and by analyzing the temporal patterns of transients and ongoing amplitude fluctuations that allow to group spectral components belonging to sounds from one source. Improved segregation of simultaneously active sources on the basis of exploitation of temporal patterns is the hallmark of the psychophysical effect of comodulation masking release (CMR, see review by Verhey et al. 2003a) and the comodulation detection difference (CDD, an experimental paradigm demonstrating improved segregation of sounds based on envelope decorrelation, e.g. see McFadden & Wright 1990). Both CMR and CDD are topics of ongoing research in the groups of Verhey, Feudel and Klump (see profiles). A proposed neuronal modeling dissertation project described below sheds light on the possible mechanisms underlying CDD and CMR. Recent physiological studies demonstrated that neurons with a frequency sensitivity to spectro-temporal modulation patterns (Kowalski et al. 1996). The concept of spectro-temporal modulation filters was already successfully incorporated in speech recognition systems to extract features of the speech (Kleinschmidt, 2002). Recent psychoacoustical modulation masking data with complex maskers indicate that modulations are processed nonlinearly (Moore et al. 1999, Ewert et al. 2002, Verhey et al. 2003b). The auditory system has been shown to exploit second-order modulations in signal detection (i.e., periodic changes of a modulation pattern, see Lorenzi et al. 2001). The mechanisms underlying the extraction of second-order modulation and its interaction with first-order modulations is still not fully understood (Füllgräbe & Lorenzi, 2003, however, see Ewert et al. 2002) and are a topic of one of the proposed projects. The concept of first and second-order modulation is very common in visual studies (e.g., Chub & Sperling, 1988, Cavanagh & Mather, 1989). Although similarities between these senses are apparent, it still needs to be shown, if similar models can be used for the visual and the auditory system.
Temporal pattern processing of the waveform envelope also plays an important role for the neural representation and perception of pitch that we associate with auditory objects (e.g., Yost 1996, Neuert, Verhey & Winter 2005). Due to the nature of the preprocessing algorithms currently used for cochlear implants, temporal information such as the phase of the waveform or rapid envelope modulations that can induce a pitch percept are lost. The project by the group of van Dijk at Groningen aimed at supplying these temporal cues to cochlear implant patients and studied the effect of providing additional temporal information.
In contrast to the segregation of simultaneous sounds from different sources, sequential sounds from the same source should not be segregated but must be integrated into one auditory stream forming one auditory object. We have evidence that the integration of sequential stimuli may be achieved by representing the sounds in each auditory stream by a specific population of neurons in the auditory pathway showing an activity pattern that differs from the pattern in other populations of neurons representing different auditory streams (e.g. see Bee & Klump 2004, Bee & Klump 2005). Integration is also important for combining the representation of signals in different pathways of the auditory system. For example, separate pathways are involved in the analysis of spatial cues ("where-pathway") and of spectro-temporal cues used to identify signals ("what-pathway", e.g., see Rauschecker & Tian 2000). To provide for a coherent perception of objects in auditory scene analysis without a dissociation of object characteristics, the information in both pathways needs to be bound together. Similar requirements apply to the combination of the auditory and visual analysis in multimodal integration. Such multi-modal integration was the focus in the projects of Klump and Colonius that studied auditory visual integration in a bird and in humans. The state of the research relevant for these projects was described below.
In addition to utilizing the stimulus-driven bottom-up mechanisms for object formation, neural systems continuously evaluate the incoming signal to generate hypotheses about the organization of the outside world. These hypotheses are applied in a top-down approach to segregate sensory objects from the rest of the stimuli in the environment. It requires memory for acquiring the history of sensory events that is used to build the hypotheses (e.g. see review by Baddeley 2003). They can be applied in invoking attention mechanism that support the separate analysis of stimuli from one source. The role of attention and the role of auditory memory were the focus of two dissertation projects.
Since the physical rules operating on animals and us in the environment are fixed, evolution has shaped the animal and human sensory systems to exploit these rules. Nevertheless, the physics operating in the environment and the physiology of the neural sensors and processing units in the nervous system provide constraints under which the neurosensory systems evolved. In visual perception, the conditions of ambient light in natural scenes and the statistics of the lighting conditions will determine which optical characteristics are important for the evolution of signals and the associated perceptual mechanism (e.g. see Endler 1992, Endler and Basolo 1998). In auditory perception or the mechanoreception by the fish lateral line organ, background noise and turbulences of the medium are a major factor in determining the limits to processing by the sensory organs and thus to perception (for a review auditory perception see Klump 1996, for a review on 3-D signal source localization under water using the lateral line see Bleckmann 2004). The projects by the group of Stavenga/van Netten at Groningen focuses on the constraints to signal analysis in the natural environment and how they affect the perception of sensory objects. This is exemplified by the analysis of the optical and physiological characteristics of butterfly eyes is that complemented by optical studies of butterfly wings as well as of natural scenes (e.g. see Stavenga 2002, 2005, project Stavenga), or by the constraints affecting the hydrodynamic source localization by the fish lateral line (project van Netten/Stavenga).
The common aim to all projects in the subject area "objects" was to provide an understanding of the neurosensory basis of scene analysis. Integrating the results from the different sensory modalities supported the development of biomimetic applications that, for example, can lead to better hearing aids or man-machine interfaces. The third part "Applications" of this proposal provided some examples of such a transfer.