Binaural Noise Reduction
Binaural Noise Reduction
Imposing Correlation Structures for Deep Binaural Spatio-Temporal Wiener Filtering
Marvin Tammen, Simon Doclo
To improve speech quality and intelligibility in environments with noise and interfering sounds, binaural speech enhancement algorithms use the microphone signals from both the left and the right hearing device to generate an enhanced output signal for each ear. As a multi-frame extension of the binaural multi-channel Wiener filter, in this paper we consider the binaural spatio- temporal Wiener filter (STWF) in the short-time Fourier transform domain, which requires estimates of the highly time-varying spatio-temporal correlations of the speech and interference components. To this end, the binaural STWF is embedded into an end-to-end supervised learning framework, where temporal convolutional networks estimate the required quantities, i.e., the inverse spatio-temporal correlation matrices of the interference component and the spatio-temporal correlation vectors and power spectral densities of the speech component. In this paper, we investigate the potential of imposing spatio-temporal correlation structure on these quantities and relating these quantities between the left and the right hearing device. Assuming that the spatial correlation of the speech component is stationary over a small number of frames, we propose to decompose the spatio-temporal correlation vectors as the Kronecker product of a relative transfer function vector and a temporal correlation vector, either considering a global reference microphone or a reference microphone for each hearing device. In addition, we consider a deep bilateral STWF by neglecting the spatio-temporal correlations of the speech and interference components between both devices. The imposed spatio-temporal correlation structures greatly differ in the number of parameters that need to be estimated. Simulation results based on simulated binaural room impulse responses and diverse speech and noise sources demonstrate that the proposed spatio- temporal correlation structures significantly reduce the computational complexity of the binaural STWF while yielding a similar speech enhancement performance compared to not imposing any spatio-temporal correlation structure. Furthermore, the results confirm that the deep binaural STWF outperforms the binaural Conv-TasNet algorithm as well as directly estimating the binaural multi-frame filter coefficients.
Deep Multi-Frame MVDR Filtering for Binaural Noise Reduction
Marvin Tammen, Simon Doclo
To improve speech intelligibility and speech quality in noisy environments, binaural noise reduction algorithms for head-mounted assistive listening devices are of crucial importance. Several binaural noise reduction algorithms such as the well-known binaural minimum variance distortionless response (MVDR) beamformer have been proposed, which exploit spatial correlations of both the target speech and the noise components. Furthermore, for single-microphone scenarios, multi-frame algorithms such as the multi-frame MVDR (MFMVDR) filter have been proposed, which exploit temporal instead of spatial correlations. In this contribution, we propose a binaural extension of the MFMVDR filter, which exploits both spatial and temporal correlations. The binaural MFMVDR filters are embedded in an end-to-end deep learning framework, where the required parameters, i.e., the speech spatio-temporal correlation vectors as well as the (inverse) noise spatio-temporal covariance matrix, are estimated by temporal convolutional networks (TCNs) that are trained by minimizing the mean spectral absolute error loss function. Simulation results comprising measured binaural room impulses and diverse noise sources at signal-to-noise ratios from −5 dB to 20 dB demonstrate the advantage of utilizing the binaural MFMVDR filter structure over directly estimating the binaural multi-frame filter coefficients with TCNs.
Binaural LCMV beamforming with partial noise estimation
Nico Gößling, Elior Hadad, Sharon Gannot, Simon Doclo
Besides reducing undesired sources, i.e., interfering sources and background noise, another important objective of a binaural beamforming algorithm is to preserve the listener's spatial impression of the acoustic scene, which is achieved by preserving the binaural cues of all sound sources. While the binaural minimum variance distortionless response (BMVDR) beamformer provides a good noise reduction performance and preserves the binaural cues of the desired source, it does not allow to control the reduction of the interfering sources and distorts the binaural cues of the interfering sources and the background noise. Hence, several extensions have been proposed. First, the binaural linearly constrained minimum variance (BLCMV) beamformer uses additional constraints, enabling to control the reduction of the interfering sources while preserving their binaural cues. Second, the BMVDR with partial noise estimation (BMVDR-N) mixes the output signals of the BMVDR with the noisy reference microphone signals, enabling to control the binaural cues of the background noise. Aiming at merging the advantages of both extensions, in this paper we propose the BLCMV with partial noise estimation (BLCMV-N). We show that the output signals of the BLCMV-N can be interpreted as a mixture between the noisy reference microphone signals and the output signals of a BLCMV using an adjusted interference scaling parameter. We provide a theoretical comparison between the BMVDR, the BLCMV, the BMVDR-N and the proposed BLCMV-N in terms of noise and interference reduction performance and binaural cue preservation. Experimental results using recorded signals as well as the results of a perceptual listening test show that the BLCMV-N is able to preserve the binaural cues of an interfering source (like the BLCMV), while enabling to trade off between noise reduction performance and binaural cue preservation of background noise (like the BMVDR-N).
RTF-steered binaural MVDR beamforming incorporating multiple external microphones
Nico Gößling, Wiebke Middelberg, Simon Doclo
Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, USA, Oct. 2019
The binaural minimum-variance distortionless-response (BMVDR) beamformer is a well-known noise reduction algorithm that can be steered using the relative transfer function (RTF) vector of the desired speech source. Exploiting the availability of an external microphone that is spatially separated from the head-mounted microphones, an efficient method has been recently proposed to estimate the RTF vector in a diffuse noise field. When multiple external microphones are available, different RTF vector estimates can be obtained by using this method for each external microphone. In this paper, we propose several procedures to combine these RTF vector estimates, either by selecting the estimate corresponding to the highest input SNR, by averaging the estimates or by combining the estimates in order to maximize the output SNR of the BMVDR beamformer. Experimental results for a moving speaker and diffuse noise in a reverberant environment show that the output SNR-maximizing combination yields the largest binaural SNR improvement and also outperforms the state-of-the art covariance whitening method.
RTF-steered binaural MVDR beamforming incorporating an external microphones for dynamic acoustic scenarios
Nico Gößling, Simon Doclo
Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, May 2019
A well-known binaural noise reduction algorithm is the binaural minimum variance distortionless response beamformer, which can be steered using the relative transfer function (RTF) vectors of the desired source. In this paper, we consider the recently proposed spatial coherence (SC) method to estimate the RTF vectors, requiring an additional external microphone that is spatially separated from the head-mounted microphones. Although the SC method provides a biased estimate of the RTF between the head-mounted microphones and the external microphone, we show that this bias is real-valued and only depends on the SNR in the external microphone. We propose to use the SC method to estimate the extended RTF vectors that also incorporate the external microphone, enabling to filter the external microphone signal in conjunction with the head-mounted microphones. Evaluation results using recorded signals of a moving speaker in diffuse noise show that the SC method yields a slightly better performance than the widely used covariance whitening method at a much lower computational complexity.
Perceptual Evaluation of Binaural MVDR-based Algorithms to Preserve the Interaural Coherence of Diffuse Noise Fields
Nico Gößling, Daniel Marquardt, Simon Doclo
Trends in Hearing, vol. 24, pp. 1–18, Apr. 2020.
Besides improving speech intelligibility in background noise, another important objective of noise reduction algorithms for binaural hearing devices is preserving the spatial impression for the listener. In this study, we evaluate the performance of several recently proposed noise reduction algorithms based on the binaural minimum-variance-distortionless-response (MVDR) beamformer, which trade off between noise reduction performance and preservation of the interaural coherence (IC) for diffuse noise fields. Aiming at a perceptually optimized result, this trade-off is determined based on the IC discrimination ability of the human auditory system. The algorithms are evaluated with normal-hearing participants for an anechoic scenario and a reverberant cafeteria scenario, both in terms of speech intelligibility using a matrix sentence test as well as spatial quality using a MUlti Stimulus test with Hidden Reference and Anchor (MUSHRA). The results show that all considered binaural noise reduction algorithms are able to improve speech intelligibility compared to the unprocessed microphone signals, where partially preserving the IC of the diffuse noise field leads to a significant improvement in perceived spatial quality compared to the binaural MVDR beamformer while hardly affecting speech intelligibility.