State of the art and own previous work // University of Oldenburg

State of the art and own research

Sound and speech quality

While models for auditory perception have a long history and have primarily been used to understand the processing the acoustical input into its "internal representation" in our brain (see, e.g., models by Zwicker and Fastl, 1990, the Cambridge model (Patterson et al., 1995), the Boston model (Colburn, 1996) and the Oldenburg model of "effective" signal processing (Dau et al., 1997)), there is also a long history of using these models to predict human behaviour in more complex and applied situations, such as, e.g. predicting the perceived quality of speech transmission (Beerends and Stemerdink, 1994, Hansen and Kollmeier, 2000), audio-quality prediction for high-quality audiocoding schemes (e.g., Beerends and Stemerdink, 1992, Huber & Kollmeier, 2004) and certain aspects of the perceived sounds, such as, e.g. roughness, sharpness and tonality (e.g. Zwicker and Fastl, 1990, Mummert, 1997). The idea behind the approach is that the perceptual distance between the original signal and the modified or distorted signal should be evaluated by not considering the acoustical audio signal, but rather a more perceptual relevant transformation of the signal, i.e., the signal at the output of an appropriate auditory model. Even though some attempts have been made to include hearing impairment into the model (Derleth et al., 2001) and to predict the performance of hearing aids with appropriate models (Schmalfuß, 2004), the adaptation of existing models towards sensorineural hearing impairment and the refinement of the models to more sophisticated tasks with normal listeners (such as tonality perception) is still an open issue.

Neurosensory model-based signal processing

Yet another application of neurosensory models (in our case: mostly auditory models) is the transfer of the "effective" processing principles into appropriate (technical) signal processing techniques assuming that this copy will preserve some of the remarkable properties of natural neurosensory systems (such as, e.g. robustness against noise, robustness against changes of the environment, operability even in complex environments). One application area that is of high interest in this area is automatic speech recognition which by far does not work as well as our human auditory system does (Lippmann,1997, Hermansky, 1998) . The discrepancy between the performance of artificial system and human performance is partly due to the usage of insufficient, not human-like features for signal processing, so that the development of appropriate features for automatic speech recognition and for automatic noise reduction has to be further investigated (Kollmeier, 2003). On the other hand, the exploitation of these features by appropriate recognition algorithms ("artificial intelligence") has been very limited (cf. overview by Waibel et al., 1995, Wahlster, 2000). One possibility to overcome some of these problems is to adopt principles from biological object binding and object characterization and to transfer them into signal processing schemes. First attempts of such an approach have successfully been applied to blind sound source separation (cf. Anemüller and Kollmeier, 2000, 2003, Mei et al., 2005). Similar principles also hold for the visual domain, where a better representation of the human visual system in image coding might yield enormous advantages for the (lossy) coding and transmission of images (cf. Pappas and Safranek, 2000, Danyali and Mertins 2005).

Augmented reality

Interactive systems for enhanced reality (using visual, acoustical or haptic stimuli to transmit additional information to the user) can either be used as a tool for solving difficult tasks (such as, e.g. telerobotic or microsurgical applications, see below) or can be used as a research environment to study human perception in controllable, super-normal environmental conditions (such as, e.g. localization and spatial perception in virtual room acoustics). In both cases, an exact knowledge of the underlying neurosensory principles is indispensable. This contrasts to the fact that perception of psychophysical quantities in real and simulated complex environments is not yet well known and relies often on empirical experience (cf. Blauert, 2005, Székely and Satava, 1999; Vallino and Brown, 1999; Azuma et al., 2001). Hence, research in this area is both directed towards fundamental perception issues and applicability of augmented and simulated reality systems.

Within the InterGK, previous work on an acoustical enhanced reality setup has concentrated on the so-called "Flight Simulator" which will allow to test the combined effect of sound and vibration on humans in a real cockpit and cabin with simulated environmental conditions (including sound, vibration and motion). Both the perceived travel comfort and human performance is tested in a passenger aircraft environment by independently controlling the sound and the body-worn vibration input to the subject in combination with other physical environmental and intrinsic physiological and psychological parameters (Mellert et al. 2005). In addition, the "Kommunikations-Akustik-Simulator" was recently installed at the "House of Hearing" in Oldenburg (Beerends, 2005, Kollmeier, 2005). This setup allows to parametrically change the room acoustics (i.e., the apparent room size and the reverberation time of a seminar room between 0.4 sec and 4 sec). Hence, the user is acoustically embedded in an environment that can be switched from nearly acoustically dry to the ambience of a large cathedral). This is achieved by a regenerative system (VRAS), which picks up the sound with a multi-microphone system. The delayed, filtered and mixed microphone inputs are amplified and fed to a large number of speakers in the surrounding walls of the seminar room.
However, it is unclear if this enhanced acoustical reality can produce similar psychophysical results (i.e. localization, speech reception, perception of echoes) as a real room. Hence, a number of spatial psychophysical measures have to be obtained similar to those obtained in other projects on binaural and spatial hearing from own previous work (i.e. dissertations Otten, 2001, Damaschke, 2004).

With respect to the application to medical robotics and surgical navigation systems, the visualization of targeting aids via displays is the standard information channel for the feedback of the deviation between desired / preplanned positions of a surgical instrument and its actual position (Stealth Station by Medtronic, Vector Vision by Brainlab). The optical representation forces the surgeon to continuously refocus his/her eyes between the operation situs and the display anywhere in the operating theatre. Especially during highly dynamic tasks as milling or cutting the surgeon does not have enough time to use the navigation information. To improve the quality of dynamic tasks robotics systems have been introduced (RoboDoc by ISS, SurgiScope by Elekta, etc.). Telemanipulated systems (daVinci by Intuitive Surgical) or interactive systems (Hein and Lüth, 2001) enhance the feedback from the dynamic tasks by haptic feedback. The counterforce of the bone during the milling or the reaching of the preplanned working area can be displayed via force applied to the user.

Vibrotactile feedbacks via tractors (actuators that transmit vibrations to the skin of an operator) are used as an additional information channel between a technical system and a user. Up to now, vibrotactile feedback systems have also been developed for applications in biofeedback (Weissgerber et al., 2004), communication with the deaf and blind (Phillips et al., 1999), and balance enhancement for pilots and astronauts (Rupert, 2000). Van Erp (2005) has presented an approach for the communication of spatial information via tractors fixed at the torso of test persons. It has been shown that test persons are able to interpret a stimulus at the torso as directions for the steering of a cursor. However, no systematic evaluation of the possibilities and the underlying psychophysics to apply vibration and tractor feedback in telehandling surgical instruments has yet been made on a wide basis.

Functional MR Imaging

The area of functional brain imaging has experienced an exponential growth over the past decade. This development is mainly related to the technical advances and rapid acquisition of MR images as well as to the advances in post-processing software and in-scanner behavioural monitoring. Functional brain imaging has been conducted in Groningen using both fMRI and PET methodologies. The research in Groningen has concentrated on cortical correlates of auditory (including speech) processing, and methodological issues, since a special attention is given to the so far unsolved problems of machine-noise and motion artefacts in fMRI.

The previous techniques in MR Imaging, especially in auditory research, have been limited by the massive amount of noise present during fMRI recording sessions. This noise results form the interaction between the static magnetic field of the scanner and changing electrical currents in its gradient coils. Sound pressure levels of 110 dB are common and levels over 130 dB have been measured in high field scanners (McJury and Shellock, 2000; Price et al., 2001). The limited attenuation effect of earplugs or earphones as well as the fact, that MR scanners emit mainly noise with lower frequency than 3 kHz, limit the clinical usability of this passive approach (Casali und Berger, 1996). In addition, the exact reduction of noise is often not determined at the eardrum (Berger et al., 1998) and the effect of bone conduction of noise with low frequency is undetermined (Berger und Kerivan, 1983).

Other approaches for the minimization of the noise intensity within the MR scanner are compensation mechanisms directly at the noise source and active noise cancellation. The emission of noise can be reduced by the minimization of the Lorenz force (Mansfield et al., 1995), active structural measures for the damping of the bone conduction (Kessels, 2001; Moelker et al., 2003), balancing the forces at the inner and outer gradient coil (Petropolous and Morich, 1995) and the avoidance of resonance frequencies based on the analysis of the transfer function describing the transformation of the gradient signal to vibrations of the gradient structure (Hennel et al., 1999; Hoiting, 2005). The principle of active noise cancellation comprises the detection of the noise by a microphone and the emission of a phase-delayed signal by a headphone. Especially, low frequencies can be reduced by this approach (McJury et al., 1997). A combination of passive noise protection and active cancellation can reduce the noise level up to 20 dB at 125 Hz (Casali und Berger, 1996). Prototypical systems are currently under development at the MRC - Institut for Hearing Research in Nottingham (Foster et al., 2000) and by MR confon GmbH, Magdeburg (Baumgart und Kaulisch, 2004). Methods for active noise cancellation are under development (dissertation Hoiting, 2005, in Groningen, post-doc project Hoiting in Oldenburg, installed headphone system by MR confon in Oldenburg) and will be evaluated.

Image acquisition and data processing in fMRI is limited due to geometric image distortions and poor signal-to-noise ratio. Because of that, the detection of the brain activity and the reconstruction of the brain geometry are erroneous. Statistical methods and calibration algorithms are required. The research group Scientific Visualization and Computer Graphics of the University of Groningen concentrates on data pre- and postprocessing and visualization aspects of functional neuroimaging. Wavelet based methods for the analysis of fMRI time series are investigated for improved denoising. Also, the actual distribution of the noise in fMRI data is studied, and new protocols proposed for fMRI studies to guarantee the residual noise to have a normal distribution. Another area of interest is HRF-extraction by Fourier-wavelet regularised deconvolution (ForWaRD), resulting in improved statistical significance in the analysis of fMRI time series.

In a recently started project, the goal is to obtain a better understanding about a specific type of cognitive task, i.e. number processing, by simultaneous 32-channel EEG and fMRI registration and the Additive Factor Method (AFM). The unique combination of this method and different measurement techniques in an event-related design will increase our knowledge about methodological issues concerning experimental design and the relation between results observed with different measurement techniques.

In addition, the data processing will be improved by applying methods of independent component analysis (ICA) and blind sound source separation techniques as well as wavelet techniques and interactive visualization techniques. Functional MRI will be used to investigate the effect of hearing loss on central auditory processing, and to test the possible involvement of plastic changes in tinnitus.

Sprache wechseln

Change Language

Hell-/Dunkelmodus

Light mode / Dark mode

State of the art and own previous work