Binaural Noise Reduction

Deep Multi-Frame MVDR Filtering for Binaural Noise Reduction

Marvin Tammen, Simon Doclo

To improve speech intelligibility and speech quality in noisy environments, binaural noise reduction algorithms for head-mounted assistive listening devices are of crucial importance. Several binaural noise reduction algorithms such as the well-known binaural minimum variance distortionless response (MVDR) beamformer have been proposed, which exploit spatial correlations of both the target speech and the noise components. Furthermore, for single-microphone scenarios, multi-frame algorithms such as the multi-frame MVDR (MFMVDR) filter have been proposed, which exploit temporal instead of spatial correlations. In this contribution, we propose a binaural extension of the MFMVDR filter, which exploits both spatial and temporal correlations. The binaural MFMVDR filters are embedded in an end-to-end deep learning framework, where the required parameters, i.e., the speech spatio-temporal correlation vectors as well as the (inverse) noise spatio-temporal covariance matrix, are estimated by temporal convolutional networks (TCNs) that are trained by minimizing the mean spectral absolute error loss function. Simulation results comprising measured binaural room impulses and diverse noise sources at signal-to-noise ratios from −5 dB to 20 dB demonstrate the advantage of utilizing the binaural MFMVDR filter structure over directly estimating the binaural multi-frame filter coefficients with TCNs.

Audio Examples

Binaural LCMV beamforming with partial noise estimation

Nico Gößling, Elior Hadad, Sharon Gannot, Simon Doclo

Besides reducing undesired sources, i.e., interfering sources and background noise, another important objective of a binaural beamforming algorithm is to preserve the listener's spatial impression of the acoustic scene, which is achieved by preserving the binaural cues of all sound sources. While the binaural minimum variance distortionless response (BMVDR) beamformer provides a good noise reduction performance and preserves the binaural cues of the desired source, it does not allow to control the reduction of the interfering sources and distorts the binaural cues of the interfering sources and the background noise. Hence, several extensions have been proposed. First, the binaural linearly constrained minimum variance (BLCMV) beamformer uses additional constraints, enabling to control the reduction of the interfering sources while preserving their binaural cues. Second, the BMVDR with partial noise estimation (BMVDR-N) mixes the output signals of the BMVDR with the noisy reference microphone signals, enabling to control the binaural cues of the background noise. Aiming at merging the advantages of both extensions, in this paper we propose the BLCMV with partial noise estimation (BLCMV-N). We show that the output signals of the BLCMV-N can be interpreted as a mixture between the noisy reference microphone signals and the output signals of a BLCMV using an adjusted interference scaling parameter. We provide a theoretical comparison between the BMVDR, the BLCMV, the BMVDR-N and the proposed BLCMV-N in terms of noise and interference reduction performance and binaural cue preservation. Experimental results using recorded signals as well as the results of a perceptual listening test show that the BLCMV-N is able to preserve the binaural cues of an interfering source (like the BLCMV), while enabling to trade off between noise reduction performance and binaural cue preservation of background noise (like the BMVDR-N).

Audio samples

RTF-steered binaural MVDR beamforming incorporating multiple external microphones

Nico Gößling, Wiebke Middelberg, Simon Doclo

Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, USA, Oct. 2019

The binaural minimum-variance distortionless-response (BMVDR) beamformer is a well-known noise reduction algorithm that can be steered using the relative transfer function (RTF) vector of the desired speech source. Exploiting the availability of an external microphone that is spatially separated from the head-mounted microphones, an efficient method has been recently proposed to estimate the RTF vector in a diffuse noise field. When multiple external microphones are available, different RTF vector estimates can be obtained by using this method for each external microphone. In this paper, we propose several procedures to combine these RTF vector estimates, either by selecting the estimate corresponding to the highest input SNR, by averaging the estimates or by combining the estimates in order to maximize the output SNR of the BMVDR beamformer. Experimental results for a moving speaker and diffuse noise in a reverberant environment show that the output SNR-maximizing combination yields the largest binaural SNR improvement and also outperforms the state-of-the art covariance whitening method.

Audio samples

RTF-steered binaural MVDR beamforming incorporating an external microphones for dynamic acoustic scenarios

Nico Gößling, Simon Doclo

Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, May 2019

A well-known binaural noise reduction algorithm is the binaural minimum variance distortionless response beamformer, which can be steered using the relative transfer function (RTF) vectors of the desired source. In this paper, we consider the recently proposed spatial coherence (SC) method to estimate the RTF vectors, requiring an additional external microphone that is spatially separated from the head-mounted microphones. Although the SC method provides a biased estimate of the RTF between the head-mounted microphones and the external microphone, we show that this bias is real-valued and only depends on the SNR in the external microphone. We propose to use the SC method to estimate the extended RTF vectors that also incorporate the external microphone, enabling to filter the external microphone signal in conjunction with the head-mounted microphones. Evaluation results using recorded signals of a moving speaker in diffuse noise show that the SC method yields a slightly better performance than the widely used covariance whitening method at a much lower computational complexity.

Audio samples

Perceptual Evaluation of Binaural MVDR-based Algorithms to Preserve the Interaural Coherence of Diffuse Noise Fields

Nico Gößling, Daniel Marquardt, Simon Doclo

Trends in Hearing, vol. 24, pp. 1­–18, Apr. 2020.

Besides improving speech intelligibility in background noise, another important objective of noise reduction algorithms for binaural hearing devices is preserving the spatial impression for the listener. In this study, we evaluate the performance of several recently proposed noise reduction algorithms based on the binaural minimum-variance-distortionless-response (MVDR) beamformer, which trade off between noise reduction performance and preservation of the interaural coherence (IC) for diffuse noise fields. Aiming at a perceptually optimized result, this trade-off is determined based on the IC discrimination ability of the human auditory system. The algorithms are evaluated with normal-hearing participants for an anechoic scenario and a reverberant cafeteria scenario, both in terms of speech intelligibility using a matrix sentence test as well as spatial quality using a MUlti Stimulus test with Hidden Reference and Anchor (MUSHRA). The results show that all considered binaural noise reduction algorithms are able to improve speech intelligibility compared to the unprocessed microphone signals, where partially preserving the IC of the diffuse noise field leads to a significant improvement in perceived spatial quality compared to the binaural MVDR beamformer while hardly affecting speech intelligibility.

Audio samples

(Changed: 19 Dec 2022)  |