Imposing Correlation Structures for Deep Binaural Spatio-Temporal Wiener Filtering

Imposing Correlation Structures for Deep Binaural Spatio-Temporal Wiener Filtering

Imposing Correlation Structures for Deep Binaural Spatio-Temporal Wiener Filtering

Marvin Tammen, Simon Doclo

To improve speech quality and intelligibility in environments with noise and interfering sounds, binaural speech enhancement algorithms use the microphone signals from both the left and the right hearing device to generate an enhanced output signal for each ear. As a multi-frame extension of the binaural multi-channel Wiener filter, in this paper we consider the binaural spatio- temporal Wiener filter (STWF) in the short-time Fourier transform domain, which requires estimates of the highly time-varying spatio-temporal correlations of the speech and interference components. To this end, the binaural STWF is embedded into an end-to-end supervised learning framework, where temporal convolutional networks estimate the required quantities, i.e., the inverse spatio-temporal correlation matrices of the interference component and the spatio-temporal correlation vectors and power spectral densities of the speech component. In this paper, we investigate the potential of imposing spatio-temporal correlation structure on these quantities and relating these quantities between the left and the right hearing device. Assuming that the spatial correlation of the speech component is stationary over a small number of frames, we propose to decompose the spatio-temporal correlation vectors as the Kronecker product of a relative transfer function vector and a temporal correlation vector, either considering a global reference microphone or a reference microphone for each hearing device. In addition, we consider a deep bilateral STWF by neglecting the spatio-temporal correlations of the speech and interference components between both devices. The imposed spatio-temporal correlation structures greatly differ in the number of parameters that need to be estimated. Simulation results based on simulated binaural room impulse responses and diverse speech and noise sources demonstrate that the proposed spatio- temporal correlation structures significantly reduce the computational complexity of the binaural STWF while yielding a similar speech enhancement performance compared to not imposing any spatio-temporal correlation structure. Furthermore, the results confirm that the deep binaural STWF outperforms the binaural Conv-TasNet algorithm as well as directly estimating the binaural multi-frame filter coefficients.

 

 


Demos

Welcome to the Demos section, where you can experience the performance of the algorithms presented in the paper on a completely mismatched utterance including a moving target source in addition to a set of utterances from the evaluation datasets.

Moving Target

This interactive demo allows you to switch between algorithms applied to the same binaural hearing aid signals, synchronized with a video. The acoustic scenario captured in the video is entirely mismatched from the training dataset (featuring a moving target in an unseen room, with unseen quasi-diffuse noise coherence, and an unseen hearing aid configuration with unseen inter-microphone spacings).
How to Use the Interactive Demo:
1. Click the play button to start the demo with the default audio track (noisy input).
2. Use the dropdown menu below to switch between algorithms.
3. The video will continue playing from the same point after a brief interruption when switching algorithms.
4. The applied algorithm’s name is displayed as an overlay on each video for easy identification.
Thanks to Wiebke Middelberg (in the video) and Daniel Fejgin from the Signal Processing Group at the University of Oldenburg for preparing the recording.

Select Algorithm:

Audio Examples from Evaluation Datasets

Algorithm Matched, Train, 5 dB Matched, Music, 10 dB Mismatched, Road, 4 dB
noisy
clean
binaural STWF, no STCM structure, no STCV structure
binaural STWF, common STCM, no STCV structure
binaural STWF, common STCM, global RTF
binaural STWF, common STCM, ipsilateral RTF
bilateral STWF, no STCV structure
bilateral STWF, ipsilateral RTF
binaural Deep Filter
binaural Conv-TasNet
non-causal BCCTN
Webmaster (Changed: 16 Jan 2025)  Kurz-URL:Shortlink: https://uol.de/p108826en
Zum Seitananfang scrollen Scroll to the top of the page