Project B2 - Computational Auditory Scene Analysis algorithms for improving speech communication in complex acoustic environments

Contact

Principal investigators

Prof. Dr. Volker Hohmann
(Uni Oldenburg)

Prof. Dr. ir. Steven van de Par
(Uni Oldenburg)

Prof. Dr. Jörg Lücke
(Uni Oldenburg)

Prof. Dr. ir. Simon Doclo
(Uni Oldenburg)

Project B2 - Computational Auditory Scene Analysis algorithms for improving speech communication in complex acoustic environments

The long-term goal of this project is to achieve a breakthrough in the theoretical foundation and realization of auditory-inspired algorithms for analysing and processing speech in complex acoustic conditions, in order to fundamentally improve speech communication in these conditions for people with hearing difficulties.

Main research questions are to determine the most promising auditory-inspired and technical processing principles, to identify the possibilities of exploiting machine learning techniques, to optimally integrate the different processing principles and to realize demonstrators that optimally support specific applications such as hearing aids, cochlear implants and assistive listening devices.

[Translate to English:] Block diagram of the CASA processing framework

Figure 1: Block diagram of the CASA processing framework

Publications

2025

Fejgin D, Doclo S (2025) Completing sets of prototype transfer functions for subspace-based direction of arrival estimation of multiple speakers. ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, pp. 1-5. DOI: 10.1109/ICASSP49660.2025.10889277
Preprint available at: 10.48550/arXiv.2501.07524
Tammen M, Doclo S (2025) Imposing correlation structures for deep binaural spatio-temporal Wiener filtering. IEEE Transactions on Audio, Speech and Language Processing 33: 1278-1292. DOI: 10.1109/TASLPRO.2025.3548454 [Open access]
Varzandeh R, Doclo S, Hohmann V (2025) Improving multi-talker binaural DOA estimation by combining periodicity and spatial features in convolutional neural networks. EURASIP Journal on Audio, Speech, and Music Processing 2025: 5. DOI: 10.1186/s13636-025-00392-8 [Open access]

2024

Boukun V, Drefs J, Lücke J (2024) Blind zero-shot audio restoration: A variational autoencoder approach for denoising and inpainting. Proc. Interspeech 2024, Kos, Greece, 1.-5.09.2024, 4823-4827. DOI: 10.21437/Interspeech.2024-314
Brümann K, Doclo S (2024) Steered response power-based direction-of-arrival
estimation exploiting an auxiliary microphone. 32nd European Signal Processing Conference (EUSIPCO 2024), 26-30.08.2024, Lyon, France, pp. 917-921. https://eurasip.org/Proceedings/Eusipco/Eusipco2024/pdfs/0000917.pdf
Fejgin D, Hadad E, Gannot S, Koldovsky Z, Doclo S (2024) Comparison of frequency-fusion mechanisms for binaural direction-of-arrival estimation for multiple speakers. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, 14-19.04.2024, pp. 731-735. Preprint available at DOI: 10.48550/arXiv.2401.07849.
DOI: 10.1109/ICASSP48485.2024.10446394
Luberadzka J, Kayser H, Lücke J, Hohmann V (2024) Towards multidimensional attentive voice tracking - estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling. EURASIP Journal on Audio, Speech, and Music Processing 2024: 27 (18 pages). DOI: 10.1186/s13636-024-00350-w
Salwig S, Drefs J, Lücke J (2024) Zero-shot denoising of microscopy images recorded at high-resolution limits. PLoS Comput Biol 20(6):e1012192. DOI: 10.1371/journal.pcbi.1012192
Varzandeh R, Doclo S, Hohmann V (2024) Speech-aware binaural DOA estimation utilizing periodicity and spatial features in convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 32, 1198-1213. DOI: 10.1109/TASLP.2024.3356987

2023

Brümann K, Doclo S (2023) Exploiting an external microphone to improve time-difference-of-arrival estimates for Euclidean distance matrix-based source localization.Proc. ITG Conference on Speech Communication, Aachen, Germany, Sep. 2023, pp. 16-20. DOI: 10.30420/456164002
Drefs J*, Guiraud E*, Panagiotou F, Lücke J (2023) Direct evolutionary optimization of variational autoencoders with binary latents. Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, 19.–23.09.2022, Proceedings, Part III, Sep 2022, pp 357–372. DOI: 10.1007/978-3-031-26409-2_22
*joint first authorship
Fejgin D, Doclo S (2023) Assisted RTF-vector-based binaural direction of arrival estimation exploiting a calibrated external microphone array. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 04-10.06.2023, pp. 1-5. DOI: 10.1109/ICASSP49357.2023.10095634
Fejgin D, Doclo S (2023) Exploiting an extermal microphone for binaural RTF-vector-based direction of arrival estimation for multiple speakers. Forum Acousticum 2023, 11.-15.09.2023, Turin, Italy, pp 1-7. DOI: 10.61782/fa.2023.1003
Fejgin D, Middelberg W, Doclo S (2023) BRUDEX Database: Binaural room impulse responses with uniformly distributed external microphones. Proc. ITG Conference on Speech Communication, Aachen, Germany, Sep. 2023, pp. 126-130. DOI: 10.30420/456164024
Middelberg W, Gode H, Doclo S (2023) Relative transfer function vector estimation for acoustic sensor networks exploiting covariance matrix structure. 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 2023, pp. 1-5. DOI: 10.1109/WASPAA58266.2023.10248188
Mousavi H, Drefs J, Hirschberger F, Lücke J (2023) Generic unsupervised optimization for a latent variable model with exponential family observables. Journal of Machine Learning Research 24 , pp. 1-59. https://jmlr.org/papers/v24/22-0359.html
Sinha R, Scherer A-C, Doclo S, Rollwage C, Rennies J (2023) Subjective performance evaluation of single-channel speaker-conditioned target speaker extraction algorithms for complex acoustic scenes. Proc. ITG Conference on Speech Communication, Aachen, Germany, Sep. 2023, pp. 101-105. DOI: 10.30420/456164019
Varzandeh R, Doclo S, Hohmann V (2023) A two-stage CNN with feature reduction for speech-aware binaural DOA estimation. 31st European Signal Processing Conference (EUSIPCO 2023), 4-8.09.2023, Helsinki, Finland, pp. 1-5. https://eurasip.org/Proceedings/Eusipco/Eusipco2023/pdfs/0000241.pdf

2022

Brümann K, Doclo S (2022) 3D single source localization based on Euclidean distance matrices. Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Germany, Sep. 2022. DOI: 10.1109/IWAENC53105.2022.9914726
Drefs J, Guiraud, E., Lücke J (2022) Evolutionary variational optimization of generative models. Journal of Machine Learning Research 23(21):1-51. Publication available at jmlr.org/papers/v23/20-233.html
Fejgin D, Doclo S (2022) Coherence-based frequency subset selection for binaural RTF-vector-based direction of arrival estimation for multiple speakers. Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Germany, Sep. 2022.
DOI: 10.1109/IWAENC53105.2022.9914768
Hirschberger F, Forster D, Lücke J (2022) A variational EM acceleration for efficient clustering at very large scales. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(12): 9787 - 9801. Published December 2022 (online since December 9, 2021). DOI: 10.1109/TPAMI.2021.3133763
Luberadzka J, Kayser H, Hohmann V (2022) Making sense of periodicity glimpses in a prediction-update-loop - a computational model of attentive voice tracking. JASA 151(2), 712-737. DOI: 10.1121/10.0009337. Data set and preprint available at DOI: 10.5281/zenodo.6674482
Middelberg W, Doclo S (2022) Bias analysis of spatial coherence-based RTF vector estimation for acoustic sensor networks in a diffuse sound field. Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Germany, Sep. 2022. DOI: 10.1109/IWAENC53105.2022.9914715
Sönnichsen R, Llorach To G, Hochmuth S, Hohmann V, Radeloff A (2022) How face masks interfere with speech understanding of normal-hearing individuals: vision makes the difference. Otol Neurotol 43: 282-288. DOI: 10.1097/MAO.0000000000003458
Sönnichsen R, Llorach To G, Hohmann V, Hochmuth S, Radeloff A (2022) Challenging times for cochlear implant users – effect of face masks on audiovisual speech understanding during the COVID-19 pandemic. Trends in Hearing 26, 9 pages. DOI: 10.1177/233121652211343788
Sutojo S, May T, van de Par S (2022) Segmentation of multitalker mixtures based on local feature contrasts and auditory glimpses. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30, 1249-1262. DOI: 10.1109/TASLP.2022.3155285. Preprint available at DOI: 10.5281/zenodo.5599384

2021

Boos M, Lücke J, Rieger JW (2021) Generalizable dimensions of human cortical auditory processing of speech in natural soundscapes: A data-driven ultra high ﬁeld fMRI approach. NeuroImage 237: 118106. DOI: 10.1016/j.neuroimage.2021.118106 (co-funded by Cluster of Excellence Hearing4all)
Brümann K, Fejgin D, Doclo S (2021) Data-dependent initialization for ECM-based blind geometry estimation of a microphone array using reverberant speech. ITG Fachbericht 298: Speech communication, 74-78. ieeexplore.ieee.org/document/9657510
Fejgin D, Doclo S (2021) Comparison of binaural RTF-vector-based direction of arrival estimation methods exploiting an external microphone. 29th European Signal Processing Conference (EUSIPCO), 23-27. DOI: 10.23919/EUSIPCO54536.2021.9616327
Gößling N, Marquardt D, Doclo S (2021) Performance analysis of the extended binaural MVDR beamformer with partial noise estimation. IEEE/ACM Trans Audio Speech Lang Proc 29: 462-476. DOI: 10.1109/TASLP.2020.3043674
Hohmann V (2021) The Period-Modulated Harmonic Locked Loop (PM-HLL): A low-effort algorithm for rapid time-domain periodicity estimation. Acta Acoustica 5:56. [Open access]
DOI: 10.1051/aacus/2021050. Preprint available at DOI: 10.5281/zenodo.5727778. Data set available at DOI: 10.5281/zenodo.5727729
Mousavi H, Buhl M, Guiraud E, Drefs J, Lücke J (2021) Inference and learning in a latent variable model for beta distributed interval data. Entropy 23:552. DOI: 10.3390/e23050552
Tammen M, Gode H, Kayser H, Nustede EJ, Westhausen NL, Anemüller J, Doclo S
(2021) Combining binaural LCMP beamforming and deep multi-frame filtering for joint dereverberation and interferer reduction in the Clarity-2021 Challenge. Technical report.
Link to the paper

2020

Luberadzka J, Kayser H, Hohmann V (2020) Estimating fundamental frequency and formants based on periodicity glimpses: a deep learning approach. 2020 IEEE International Conference on Healthcare Informatics (ICHI), Oldenburg, Germany, 2020, pp. 1-6. DOI: 10.1109/ICHI48887.2020.9374386, 10.5281/zenodo.6674462
Sutojo S, Thiemann J, Kohlrausch A, van de Par S (2020) Auditory gestalt rules and their application. In: Blauert J., Braasch J. (eds) The Technology of binaural understanding. Modern acoustics and signal processing. Springer, Cham. pp 33-59. DOI: 10.1007/978-3-030-00386-9_2

2019

Lücke J, Forster D (2019) k-means as a variational EM approximation of Gaussian mixture models. Pattern Recognition Letters 125: 349-356. DOI: 10.1016/j.patrec.2019.04.001
Sheikh A-S, Harper NS, Drefs J, Singer Y, Dai Z, Turner RE, Lücke J (2019) STRFs in primary auditory cortex emerge from masking-based statistics of natural sounds. PLoS Comput Biol 15(1): e1006595, 1-23. DOI: 10.1371/journal.pcbi.1006595

Webmaster des SFB (Changed: 31 Mar 2025) | Kurz-URL:Shortlink: https://uol.de/p56368en | # |

Sprache wechseln

Change Language

Hell-/Dunkelmodus

Light mode / Dark mode

Information for ...

Project B2 - Computational Auditory Scene Analysis algorithms for improving speech communication in complex acoustic environments

Contact

Project B2 - Computational Auditory Scene Analysis algorithms for improving speech communication in complex acoustic environments