Automatic Speech Recognition
Contact
Head of division
Office
Address (Mail address)
For specific questions regarding one of our research topics, please contact the respective people directly (see staff list).
Automatic Speech Recognition
Questions that we ask ourselves:
- How can the content of spoken language be deduced from the acoustic waveform of a speech signal?
- How can it be determined in which environment an acoustic signal was recorded (cathedral, café...) and which objects are represented in it (speech, traffic noises, animal sounds, etc.)?
- How is it possible to extract a single speech signal from a jumble of several voices and background noises if it is not known where the individual signal sources are located?
- Is it possible to find the basic parts ("atoms") that make up a speech signal?
These problems are based on a similar question, which can be formulated as:
How can information be extracted from measured (acoustic) signals if no fixed rules exist?
Instead of a predefined function (if ..., then ...), in this type of problem either examples of data and corresponding result values are predefined (supervised learning), or abstract goals exist that should be fulfilled as well as possible by modelling the signals (e.g. "different basic signal components should occur independently of each other", unsupervised learning). We develop solutions to these problems in the field of acoustic signals using methods from the fields of physics, signal processing, machine learning and neurobiology.
Ongoing work includes the following topics:
- Phoneme recognition with support vector machines
- Recognition of acoustic scenes and events
- Feature extraction for speech and audio signals
- Blind source separation
- Modelling of neural signals
Ongoing research project:
EU project DIRAC ("Detection and Identification of Rare Audio-Visual Events'').
[link: www.diracproject.org]
Selected references:
J. Anemüller. Maximisation of Component Disjointness: a Criterion for Blind Source Separation. 7th International Conference on Independent Component Analysis and Signal Separation, London, UK, 9 - 12 September 2007. Accepted for publication.
D. Schmidt and J. Anemüller. Acoustic Feature Selection for Speech Detection Based on Amplitude Modulation Spectrograms. Fortschritte der Akustik: DAGA 2007.
J. Anemüer, J.-R. Duann, T. J. Sejnowski and S. Makeig. Spatio-temporal dynamics in fMRI recordings revealed with complex independent component analysis. Neurocomputing, 69:1502-1512, 2006.
T. Wesker, B. Meyer, K. Wagener, J. Anemüer, A. Mertins and B. Kollmeier. Oldenburg logatome speech corpus (OLLO) for speech recognition experiments with humans and machines. Proceedings Interspeech 2005, pages 1273-1276. Lisbon, Portugal, September 2005.
J. Anemüller, T. J. Sejnowski, and S. Makeig. Complex independent component analysis of frequency-domain electroencephalographic data. Neural Networks, 16:1311-1323, 2003.
J. Anemüller and B. Kollmeier. Adaptive separation of acoustic sources for anechoic conditions: A constrained frequency domain approach. Speech Communication, 39(1-2):79-95, Jan 2003.
J. Anemüller and B. Kollmeier. Amplitude modulation decorrelation for convolutive blind source separation. In Petteri Pajunen and Juha Karhunen, editors, Proceedings of the second international workshop on independent component analysis and blind signal separation, pages 215-220, Helsinki, Finland, 2000.