Project B2 - Computational Auditory Scene Analysis algorithms for improving speech communication in complex acoustic environments

Project B2 - Computational Auditory Scene Analysis algorithms for improving speech communication in complex acoustic environments

The long-term goal of this project is to achieve a breakthrough in the theoretical foundation and realization of auditory-inspired algorithms for analysing and processing speech in complex acoustic conditions, in order to fundamentally improve speech communication in these conditions for people with hearing difficulties.

Main research questions are to determine the most promising auditory-inspired and technical processing principles, to identify the possibilities of exploiting machine learning techniques, to optimally integrate the different processing principles and to realize demonstrators that optimally support specific applications such as hearing aids, cochlear implants and assistive listening devices.

Figure 1: Block diagram of the CASA processing framework

Publications

2024

  • Boukun V, Drefs J, Lücke J (2024) Blind zero-shot audio restoration: A variational autoencoder approach for denoising and inpainting. Proc. Interspeech 2024, Kos, Greece, 1.-5.09.2024, 4823-4827. DOI: 10.21437/Interspeech.2024-314
  • Brümann K, Doclo S (2024) Steered response power-based direction-of-arrival
    estimation exploiting an auxiliary microphone. 32nd European Signal Processing Conference (EUSIPCO 2024), 26-30.08.2024, Lyon, France, pp. 917-921. https://eurasip.org/Proceedings/Eusipco/Eusipco2024/pdfs/0000917.pdf
  • Fejgin D, Hadad E, Gannot S, Koldovsky Z, Doclo S (2024) Comparison of frequency-fusion mechanisms for binaural direction-of-arrival estimation for multiple speakers. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, 14-19.04.2024, pp. 731-735. Preprint available at DOI: 10.48550/arXiv.2401.07849.
    DOI: 10.1109/ICASSP48485.2024.10446394
  • Luberadzka J, Kayser H, Lücke J, Hohmann V (2024) Towards multidimensional attentive voice tracking - estimating voice state from auditory glimpses with regression neural networks and Monte Carlo sampling. EURASIP Journal on Audio, Speech, and Music Processing 2024: 27 (18 pages). DOI: 10.1186/s13636-024-00350-w
  • Salwig S, Drefs J, Lücke J (2024) Zero-shot denoising of microscopy images recorded at high-resolution limits. PLoS Comput Biol 20(6):e1012192. DOI: 10.1371/journal.pcbi.1012192
  • Varzandeh R, Doclo S, Hohmann V (2024) Speech-aware binaural DOA estimation utilizing periodicity and spatial features in convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 32, 1198-1213. DOI: 10.1109/TASLP.2024.3356987

2023

  • Brümann K, Doclo S (2023) Exploiting an external microphone to improve time-difference-of-arrival estimates for Euclidean distance matrix-based source localization.Proc. ITG Conference on Speech Communication, Aachen, Germany, Sep. 2023, pp. 16-20. DOI: 10.30420/456164002
  • Drefs J*, Guiraud E*, Panagiotou F, Lücke J (2023) Direct evolutionary optimization of variational autoencoders with binary latents. Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, 19.–23.09.2022, Proceedings, Part III, Sep 2022, pp 357–372. DOI: 10.1007/978-3-031-26409-2_22
    *joint first authorship
  • Fejgin D, Doclo S (2023) Assisted RTF-vector-based binaural direction of arrival estimation exploiting a calibrated external microphone array. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 04-10.06.2023, pp. 1-5. DOI: 10.1109/ICASSP49357.2023.10095634
  • Fejgin D, Doclo S (2023) Exploiting an extermal microphone for binaural RTF-vector-based direction of arrival estimation for multiple speakers. Forum Acousticum 2023, 11.-15.09.2023, Turin, Italy, pp 1-7. DOI: 10.61782/fa.2023.1003
  • Fejgin D, Middelberg W, Doclo S (2023) BRUDEX Database: Binaural room impulse responses with uniformly distributed external microphones. Proc. ITG Conference on Speech Communication, Aachen, Germany, Sep. 2023, pp. 126-130. DOI: 10.30420/456164024
  • Middelberg W, Gode H, Doclo S (2023) Relative transfer function vector estimation for acoustic sensor networks exploiting covariance matrix structure. 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 2023, pp. 1-5. DOI: 10.1109/WASPAA58266.2023.10248188
  • Mousavi H, Drefs J, Hirschberger F, Lücke J (2023) Generic unsupervised optimization for a latent variable model with exponential family observables. Journal of Machine Learning Research 24 , pp. 1-59. https://jmlr.org/papers/v24/22-0359.html
  • Sinha R, Scherer A-C, Doclo S, Rollwage C, Rennies J (2023) Subjective performance evaluation of single-channel speaker-conditioned target speaker extraction algorithms for complex acoustic scenes. Proc. ITG Conference on Speech Communication, Aachen, Germany, Sep. 2023, pp. 101-105. DOI: 10.30420/456164019
  • Varzandeh R, Doclo S, Hohmann V (2023) A two-stage CNN with feature reduction for speech-aware binaural DOA estimation. 31st European Signal Processing Conference (EUSIPCO 2023), 4-8.09.2023, Helsinki, Finland, pp. 1-5. https://eurasip.org/Proceedings/Eusipco/Eusipco2023/pdfs/0000241.pdf

2022

  • Brümann K, Doclo S (2022) 3D single source localization based on Euclidean distance matrices. Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Germany, Sep. 2022. DOI: 10.1109/IWAENC53105.2022.9914726
  • Drefs J, Guiraud, E., Lücke J (2022) Evolutionary variational optimization of generative models. Journal of Machine Learning Research 23(21):1-51. Publication available at jmlr.org/papers/v23/20-233.html
  • Fejgin D, Doclo S (2022) Coherence-based frequency subset selection for binaural RTF-vector-based direction of arrival estimation for multiple speakers. Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Germany, Sep. 2022.
    DOI: 10.1109/IWAENC53105.2022.9914768
  • Hirschberger F, Forster D, Lücke J (2022) A variational EM acceleration for efficient clustering at very large scales. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(12): 9787 - 9801. Published December 2022 (online since December 9, 2021). DOI: 10.1109/TPAMI.2021.3133763
  • Luberadzka J, Kayser H, Hohmann V (2022) Making sense of periodicity glimpses in a prediction-update-loop - a computational model of attentive voice tracking. JASA 151(2), 712-737. DOI: 10.1121/10.0009337. Data set and preprint available at DOI: 10.5281/zenodo.6674482
  • Middelberg W, Doclo S (2022) Bias analysis of spatial coherence-based RTF vector estimation for acoustic sensor networks in a diffuse sound field. Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Germany, Sep. 2022. DOI: 10.1109/IWAENC53105.2022.9914715
  • Sönnichsen R, Llorach To G, Hochmuth S, Hohmann V, Radeloff A (2022) How face masks interfere with speech understanding of normal-hearing individuals: vision makes the difference. Otol Neurotol 43: 282-288. DOI: 10.1097/MAO.0000000000003458
  • Sönnichsen R, Llorach To G, Hohmann V, Hochmuth S, Radeloff A (2022) Challenging times for cochlear implant users – effect of face masks on audiovisual speech understanding during the COVID-19 pandemic. Trends in Hearing 26, 9 pages. DOI: 10.1177/233121652211343788
  • Sutojo S, May T, van de Par S (2022) Segmentation of multitalker mixtures based on local feature contrasts and auditory glimpses. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30, 1249-1262. DOI: 10.1109/TASLP.2022.3155285. Preprint available at DOI: 10.5281/zenodo.5599384

2021

  • Boos M, Lücke J, Rieger JW (2021) Generalizable dimensions of human cortical auditory processing of speech in natural soundscapes: A data-driven ultra high field fMRI approach. NeuroImage 237: 118106. DOI: 10.1016/j.neuroimage.2021.118106 (co-funded by Cluster of Excellence Hearing4all)
  • Brümann K, Fejgin D, Doclo S (2021) Data-dependent initialization for ECM-based blind geometry estimation of a microphone array using reverberant speech. ITG Fachbericht 298: Speech communication, 74-78. ieeexplore.ieee.org/document/9657510
  • Fejgin D, Doclo S (2021) Comparison of binaural RTF-vector-based direction of arrival estimation methods exploiting an external microphone. 29th European Signal Processing Conference (EUSIPCO), 23-27. DOI: 10.23919/EUSIPCO54536.2021.9616327
  • Gößling N, Marquardt D, Doclo S (2021) Performance analysis of the extended binaural MVDR beamformer with partial noise estimation. IEEE/ACM Trans Audio Speech Lang Proc 29: 462-476. DOI: 10.1109/TASLP.2020.3043674
  • Hohmann V (2021) The Period-Modulated Harmonic Locked Loop (PM-HLL): A low-effort algorithm for rapid time-domain periodicity estimation. Acta Acoustica 5:56. [Open access] 
    DOI: 10.1051/aacus/2021050. Preprint available at DOI: 10.5281/zenodo.5727778. Data set available at DOI: 10.5281/zenodo.5727729
  • Mousavi H, Buhl M, Guiraud E, Drefs J, Lücke J (2021) Inference and learning in a latent variable model for beta distributed interval data. Entropy 23:552. DOI: 10.3390/e23050552
  • Tammen M, Gode H, Kayser H, Nustede EJ, Westhausen NL, Anemüller J, Doclo S
    (2021) Combining binaural LCMP beamforming and deep multi-frame filtering for joint dereverberation and interferer reduction in the Clarity-2021 Challenge. Technical report.
    Link to the paper

2020

  • Luberadzka J, Kayser H, Hohmann V (2020) Estimating fundamental frequency and formants based on periodicity glimpses: a deep learning approach. 2020 IEEE International Conference on Healthcare Informatics (ICHI), Oldenburg, Germany, 2020, pp. 1-6. DOI: 10.1109/ICHI48887.2020.9374386, 10.5281/zenodo.6674462
  • Sutojo S, Thiemann J, Kohlrausch A, van de Par S (2020) Auditory gestalt rules and their application. In: Blauert J., Braasch J. (eds) The Technology of binaural understanding. Modern acoustics and signal processing. Springer, Cham. pp 33-59. DOI: 10.1007/978-3-030-00386-9_2

2019

  • Lücke J, Forster D (2019) k-means as a variational EM approximation of Gaussian mixture models. Pattern Recognition Letters 125: 349-356. DOI: 10.1016/j.patrec.2019.04.001
  • Sheikh A-S, Harper NS, Drefs J, Singer Y, Dai Z, Turner RE, Lücke J (2019) STRFs in primary auditory cortex emerge from masking-based statistics of natural sounds. PLoS Comput Biol 15(1): e1006595, 1-23. DOI: 10.1371/journal.pcbi.1006595
(Changed: 03 Dec 2024)  | 
Zum Seitananfang scrollen Scroll to the top of the page