Dissertation Projects
Dissertation Projects
Topics of dissertation projects
Sound & speech quality
Based on psychoacoustical methods and processing models, a quantitative analysis of quality aspects in sound and speech perception for normal and hearing-impaired listeners is planned for. This is necessary to model and predict the subjective assessment of technical sounds (such as, e.g., the tonality in tyre-road noise, the subjective assessment of kitchen appliances), or the transmission quality of hearing aids and other speech-processing systems. The models are based on features of the hypothetical "internal representation" of the acoustical signals and on more cognitive stages, such as, e.g. contour processing.
Modelling sound quality assessment for normal and hearing-impaired listeners
Objective speech and sound quality assessment methods were until now primarily used for telecommunications and audio coding purposes. An adaptation of these methods for the objective assessment of the overall quality of the hearing aid output and the perceived quality of, e.g. noise reduction algorithms has only been started recently with limited success (Marzinzik & Kollmeier, 2003). The aim of the proposed dissertation project is to develop better objective quality measures and to test them against existing methods.
The objective quality measure for hearing aid algorithms can be subdivided into two domains (that might or might not eventually be covered by two different quality measures):
- Prediction of algorithmic degradation or improvement relative to a predefined reference signal ("small effect prediction"), (e.g. for estimating the effect of noise reduction schemes or the influence of audible (nonlinear and linear) distortions at a certain presentation level, a certain frequency response and stimulus presentation mode).
- Prediction of absolute audio quality perceived by hearing-impaired listeners with or without hearing aids (in relation to the audio quality perceived by normal listeners, "big effect prediction").
The work will first compile a database both from available data and from a series of perceived quality measurements with a significant amount of normal and hearing-impaired listeners. Subsequently, the available models will be tested as a reference. In addition, several new measures will be developed and tested that are based on the work in sections a) and b) of the InterGK where a more refined model of sensorineural hearing loss is expected to come out. Also, the options for composite models will be explored that combine different model approaches. The outcome of the project will be a validated objective quality measure for each of the two domains outlined above that can serve as an input to further standardization procedures.
Modelling "Tonhaltigkeit" (tonality) of artificial and technical sounds
Environmental noise and technical noise often have a distinct tonality ("Tonhaltigkeit"), which considerably decreases the acceptance of those sounds. Over the last years many attempts have been made to introduce a standard for the calculation of tonality. However, current measures of calculation are still unsatisfactory, since they only cover a subset of sounds with usually stationary tonal components. The aim of the present study is the development of a model of tonality on the basis of a spectro-temporal representation of the sound after a realistic auditory preprocessing. The model will be built on pitch models, which could be either effective (e.g. Mummert, 1997) or based on neural data (e.g. Wiegrebe and Meddis, 2004). Psychoacoustical experiments will be performed with artificial sounds having the similar tonal components as technical sounds. Model predictions will be compared to the tonality ratings of these artificial sounds and technical sounds.
Neurosensory-model-based signal processing
The next application of neurosensory system knowledge is signal processing both for audio and video signals, such as, e.g., perception- model based audio and video coding, noise suppression for hearing hearing aids and coding strategies for cochlear implants or retina implants. Computational auditory scene analysis employing object binding principles from area b) will be further developed to segregate acoustical streams and to suppress "undesired" noise objects. Finally, automatic speech recognition will be made more robust against noise and against speaker variations by employing appropriate neurosensory preprocessing methods and by normalizing the (effective) length of the speaker's vocal fold.
Speaker independent features for automatic speech recognition
Humans recognize speech with very high accuracy, and, within certain limits, their performance is quite independent of speaker variations like age, gender, size, or pitch and other intrinsic variabilities such as speaking rate and pronunciation variants. Automatic speech recognizers, on the other hand, not only perform significantly inferior to humans, their performance also depends highly on the match between the training and test conditions. While current automatic speech recognizers tackle problems with speaker variations in the backend (typically hidden Markov models), we plan to investigate the question of how we can deal with such variations already at the feature level. The goal of this project is to combine the Oldenburg perception model (PEMO) according to Kollmeier (2003) with feature extraction techniques such as those proposed by Mertins and Rademacher (2005), which aim at minimizing the effect of vocal tract length variations, to obtain features which are highly independent of the actual speaker and give highly discriminative information on the uttered phonetic sequences.
Acoustical object separation for hearing aids and speech recognizers based on auditory features
Building up on models of feature extraction and acoustical scene analysis in the auditory system, several prototypes of noise suppression algorithms and/or speaker separation methods shall be developed and demonstrated on prototypical applications. The basic idea is that an auditory model should be used as a front-end to transform the acoustical input signal to an "internal representation" which includes a binaural display or a localization algorithm looking at interaural transfer functions as a function of frequency. To classify an acoustical object at a certain direction of incidence, a trained base Bayes classifier will be used that integrates the internal representation across centre frequency bands and across certain ranges of amplitude modulation frequencies. Alternatively, a particle filter approach can be used (Meyer et al., 2005, Nix, 2005) to improve the performance of a statistical localization classifier. Even though the processing does not preserve the original speech or audio signal, the extracted information (i.e. localization of the "desired" versus "undesired" objects) is worth the extra effort.
Modelling and recognition of complex auditory environments for hearing aid algorithms
In order to switch to the most appropriate hearing aid algorithm in a sensible way for the user, the hearing aid has to probe which acoustical environment it is currently located at. Since only the most relevant features of the "internal representation" of the external sound shall be used, a selection of the most appropriate features can be used using the feature-finding neural net (FFNN: Gramß and Strube, 1990, Kleinschmidt et al., 2001). Hence, the dissertation project should first develop and probe appropriate auditory feature sets that are based on the project areas a) and b) and should also consider comparatively complex combined features (such as, e.g., second-order features that occupy a certain combination of locations in the spectrotemporal domain, the modulation-frequency vs. time domain, the modulation spectrogram or the interaural displacement vs. frequency domain.). Using the FFNN, the most appropriate feature sets will be selected that discriminate best across a wide selection of environmental sounds and acoustical environments. In the second part, the application of this optimized feature set will be tested both for predicting the subjective impression of psychoacoustic quantities (like sharpness or roughness prediction) and for classifying the respective acoustical situation.
Use of object-binding principles for auditory stream segregation and blind source separation
The goal of blind source separation is the de-mixing of simultaneously measured sensor signals that contain contributions from different independent sources. Technical solutions to the separation problem for acoustic environments are usually based on the mathematical framework of independent component analysis (ICA), carried out in the frequency domain separately for each frequency. This automatically leads to a local permutation problem where the frequency components of signals are de-mixed in random (different) order at different frequencies. As a remedy, Anemüller and Kollmeier (2000) have incorporated the principle of amplitude modulation demodulation into the objective function for ICA. Other approaches use time-domain constraints (Parra and Spence 2000; Mei et al. 2005) or dyadic sorting of frequency components (Rahbar and Reilly 2003). The goal of this project is to investigate if it is technically feasible to use known object-binding principles (co-modulation, common onsets, etc.) in a straight-forward manner for segregating the various de-mixed frequency components into streams that belong to the different sources. Of particular interest in this project is the underdetermined case, where we have more sources than sensors and which can only be tackled with nonlinear methods (Bofill 2003, Rickart and Yilmaz 2002). The underdetermined case is the usual mode for human listeners who can separate more than two sources despite having only two sensors.
Perceptual image coding
Current image coding techniques try to minimize the distortion (measured as the mean square error (MSE) between the original and the encoded image), given a constraint on the maximum bitrate. The use of perceptual measures and properties of the human visual system is still in its infancy, but it bears the chance to achieve much better perceived image quality as with the currently used MSE-based techniques at the same bitrate, or, equivalently, much lower bitrates at the same quality (Pappas and Safranek 2000, Miyahara et al. 1998). In the proposed project we plan to use visual masking models to prune the image data in such a way that the irrelevant information is removed and only the perceptually significant parts are transmitted. The anticipated encoding strategy is based on the ones proposed by Raad, Mertins and Burnett (2003) for audio and Danyali and Mertins (2005) for image coding. An image will then be encoded in a hierarchical manner where the perceptually most important information is encoded first.
Augmented reality
Basic neurosensory processing principles will also be used to overlay the actual input to the respective sensory system with an artificial stimulus that conveys additional information to the user in a specially treated environment: The newly established "Flight Simulator" will allow to test the combined effect of sound and vibration in combination with important other environmental parameters on the perceived travel comfort of passengers and performance of crew in a passenger aircraft environment by independently controlling the sound and vibration input to the subject in coaction with e.g. motion, lighting, seating, air quality, work-load. Similarly, the communication acoustic simulator (established 2004) allows to systematically vary the reverberation and the acoustically perceived room size by electronic means and to study the influence of the (augmented) sound field on sound quality perception and on speech intelligibility. Finally, augmented reality will be used to study and to improve haptic man-machine interaction in the operation room using surgical robots that can either direct the surgeons movement or can signal unforeseen changes (e.g., a sudden decrease in resistivity against the drill).
Localization, speech perception and reverberation assessment in simulated room acoustics using the KAS
The aim of this thesis is to study the limits and minimum requirements of room acoustical enhancement systems to generate an interactive, simulated acoustical environment, e.g. for hearing-aid fitting or evaluation of communication systems in rooms. To do so, several critical psychophysical measures of room acoustics will have to be obtained both in real rooms and in the room with simulated acoustics (KAS) using the same normal and hearing-impaired subjects. Specifically, a localization JND (just noticeable difference) for frontal direction, the occurrence of front/back confusions and speech intelligibility for one target sound source in combination with one to three interfering noise sources will be obtained as a function of room acoustic and accuracy in realizing the "augmented reality". As additional parameter, the subjective assessment of the room width/reflection impression will be considered. The results will indicate how well the KAS can simulate a different room acoustical environment and what further simplification of the augmented acoustical reality can be used in certain situations without losing the desired effect. This will define the minimum requirements for the simulation of different acoustical environments that are used, for example, in hearing aid fitting.
Combined impact of sound and vibration in passenger airplanes in a human response model utilizing a flight simulator
Previous investigations of environmental impact on passenger comfort and on performance of flight and cabin crew revealed significant effects of sound and vibration in combination with time of exposure and other non-acoustic parameters. The findings are based on measurements in real long-haul flights and selected measurements in flight-simulator facilities (performed in the European HEACE project [www.heace.org, 2001-2005, co-ordinator Oldenburg University]).
Experiments in simulators are inevitable because defined change of environmental conditions in real flights is obviously limited. In order to apply simulator results to real world it is necessary to achieve a quantitative validation of data from the simulator against selected real-flight data. Simulators used up to date provide only a limited virtual reality and a limited variation of the virtual environment. The new Oldenburg Flight Simulator (Mellert et al., 2005) is capable to create the virtual reality for both cabin and cockpit environment and can establish the environmental conditions for the investigation of long-haul flights (up to 12 hours). It is possible to change environmental conditions in the Flight Simulator - including motion - within a realistic range of parameters, except reduced pressure. It is not yet investigated to what extent results from such an advanced simulator facility is transferable to real-flight situations. Based on a 3-step full factorial design (vibroacoustics, temperature, humidity) experiments have been carried out in the HEACE project in simulators, which only provide a preliminary virtual reality. These results prove evidence for significant mutual interaction of the three parameters with respect to relevant indices of performance and comfort. Impact of sound and vibration and in particular the dependence on time of exposure will be investigated in this thesis in relation to synergetic impact of other environmental parameters in aircraft environment. Portability of results gained in simulator experiments to real flights will be investigated on basis of real-flight data measured in other projects. Objective is an improvement of the simulation environment and contributions to a human-response model, which is based on the first framework derived in HEACE (Nokas et al., 2005).
Spatial navigation assistance using vibrotactile display
In this thesis the vibrotactile feedback will be evaluated for its application in surgical navigation. The aim is to supply the user with a direct haptic input to the hand to provide directional information (e.g., about the direction to steer an operation tool) and process information (e.g. distinct vibrations that are generated during mechanical contact between mill head and bone). However, a number of fundamental questions (e.g., directivity of perceived hand/arm vibrations, spatial coordinate system for the hand in relation to the body and the environment) have to be answered before the goal can be reached.
In a first realization the suitable positions for tractors at one hand will be selected and the hand will be equipped with a localizer that enables the determination of the spatial position and orientation of the tractor arrangement / the hand. In addition the deviation vector between an actual and desired position of a surgical instrument will be determined and the tractors at the hand will display the correction direction. Based on this initial experiment the following questions will be addressed:
- Threshold and usable dynamic range of applied vibrations to the hand and the arm as a function of stimulation position and vibration frequency and stimulation site
- Usable stimuli for conveying directional information to the user
- Accuracy and timing of the positioning of instruments in comparison to optical feedback systems
- Learning curve
- Optimal tractor distribution at the hand
- Deviations between the internal reference coordinate systems of the hand in respect to the external coordinate system
- Spectral composition and dynamic range of the recorded vibrations in "critical situations" (obtained with phantom tissue)
- Usability of parts of this signal for haptic guidance
Based on these facts, detailed recommendations and first implementations for the extension of an existing surgical robot system (Hein, 2004; Lenze and Hein, 2005) by haptic feedback will be developed.
Functional MR Imaging
A number of primarily technical research efforts will be bundled to improve the recording and the evaluation of functional imaging primarily using MR scanners. This will interact with an improved usage of the scanner within the areas a) and b). In order to reduce the scanner noise to facilitate auditory fMRI, the noise field prevalent in the scanner will be assessed with different spatially resolved methods (e.g., optical microphone, microflown probe, laser vibrometer tracked by a positioning measurement system already installed in the MR scanner in Oldenburg). Methods for active noise cancellation will be developed and tested. In addition, the data processing will be improved by applying methods of independent component analysis (ICA) and blind sound source separation techniques as well as wavelet techniques and interactive visualization techniques. Functional MRI will be used to investigate the effect of hearing loss on central auditory processing, and to test the possible involvement of plastic changes in tinnitus.
Independent component analysis of fMRI data
The analysis of fMRI data is usually based on techniques that require a priori knowledge of the time structure of all the processes that contribute to the measured data. In recent years, the method of independent component analysis (ICA) has been proposed as an alternative, as it does not need to make such a priori assumptions. When applied to fMRI data, it is capable of delivering the independent processes that contribute to the measurements, and in the case of fMRI with auditory stimuli, the time structure of exactly one of the extracted independent components should highly correlate with the presented stimuli (McKeown et al. 1998). In this project, we plan to make use of the ICA technique for analyzing fMRI data. This includes both the development of ICA for fMRI as a tool as well as the use of this technique for the interpretation of our measured data. Spatio-temporal dynamics will be taken into account as anticipated by Anemüller et al. (2005).
Active noise cancellation
Aim of this thesis is the minimization of the noise emission within a MR scanner during the image acquisition. A number of primarily technical research efforts will be bundled to improve the recording and the evaluation of functional imaging primarily using MR scanners. This will interact with an improved usage of the scanner within the areas a) and b). In order to reduce the scanner noise to facilitate auditory fMRI, the noise field prevalent in the scanner will be assessed with different spatially resolved methods (e.g., optical microphone, microflown probe, laser vibrometer tracked by a positioning measurement system, already installed in the MR scanner in Oldenburg). Based on the identified field properties two approaches for the active cancellation will be combined – the elimination of the noise near the identified sources by phased-delayed sound sources fixed at the scanner housing and a headphone system with integrated microphone for the optimal reduction of the noise at the eardrum. In contrary to existing solutions the sequence of anti-noise will not be chosen static, but adaptive filters will be implemented. Varying parameters of the adaptive filter will be the position of the eardrums and the actual sound intensity. Another main focus of the thesis will be the system evaluation in respect to the distortions of the MR images by the components of the noise cancellation system, the resulting noise reduction (measured and subjective) and the practicability for auditory fMRI.
Application of imaging techniques to central auditory processing and to tinnitus
Tinnitus is an auditory phantom sensation. In most patients that express tinnitus complaints, it is associated with hearing loss. Although hearing loss may trigger the sensation of a tinnitus percept, most current neurophysiological theories of tinnitus assume that central auditory mechanisms play an important role in tinnitus. Animal research shows that conditions that lead to tinnitus in human (e.g. sound overexposure), result in reorganization of cortical maps and in changes in spontaneous activity (Eggermont and Roberts, 2004).
Reorganization of activity in the human central auditory system may be studied by functional MRI. The response of both cortical and subcortical auditory centers in the brain may be assessed with this technique (Langers et al., 2005). The proposed research is aims to further asses objective representation of tinnitus by functional MRI. Since fMRI requires comparison between stimulus conditions, the research will focus on cases where the tinnitus can be modified by some external stimulus. This includes: residual inhibition, gaze-evoked tinnitus, and tinnitus that is modulated by somatosensory input.