Regularization for Partial Multichannel Equalization for Speech Dereverberation
Ina Kodrasi, Stefan Goetze, Simon Doclo
IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 9, pp. 1879-1890, Sep. 2013.
Acoustic multichannel equalization techniques such as the multiple-input/output inverse theorem (MINT), which aim to equalize the room impulse responses (RIRs) between the source and the microphone array, are known to be highly sensitive to RIR estimation errors. To increase robustness, it has been proposed to incorporate regularization in order to decrease the energy of the equalization filters. In addition, more robust partial multichannel equalization techniques such as relaxed multichannel least-squares (RMCLS) and channel shortening (CS) have recently been proposed.
In this paper, we propose a partial multichannel equalization technique based on MINT (P-MINT) which aims to shorten the RIR. Furthermore, we investigate the effectiveness of incorporating regularization to further increase the robustness of PMINT and the aforementioned partial multichannel equalization techniques, i.e. RMCLS and CS. In addition, we introduce an automatic non-intrusive procedure for determining the regularization parameter based on the L-curve.
Instrumental and Perceptual Evaluation of Dereverberation Techniques based on Robust Acoustic Multi-Channel Equalization
Ina Kodrasi, Benjamin Cauchi, Stefan Goetze, Simon Doclo
Dereverberation techniques based on acoustic multi-channel equalization, such as the re-
laxed multi-channel least squares (RMCLS) technique and the partial multi-channel equal-
ization technique based on the multiple-input/output inverse theorem (PMINT), are known
to be sensitive to room impulse response (RIR) perturbations. RIR perturbations may lead to
perceptually severe distortions in the output signal of these techniques. In order to increase
their robustness, several methods have been proposed, e.g., using a shorter reshaping filter
length, incorporating regularization, or incorporating a sparsity-promoting penalty function.
This paper focuses on evaluating the performance of these methods both using instrumental
performance measures as well as using subjective listening tests, with the aim of determining
the most robust and perceptually advantageous equalization technique. While commonly used
instrumental performance measures indicate that the regularized RMCLS technique yields the
largest reverberant energy suppression, listening tests show that the regularized and sparsity-
promoting PMINT techniques yield the best perceptual speech quality. By analyzing the cor-
relation between the instrumental and the perceptual results, it is shown that signal-based
performance measures are more advantageous than channel-based performance measures to
evaluate the perceptual speech quality of signals dereverberated by equalization techniques.
Furthermore, this analysis also demonstrates the need to develop more reliable instrumental
Multi-channel Late Reverberation Power Spectral Density Estimation Based on Nuclear Norm Minimization
Ina Kodrasi and Simon Doclo
Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New York, USA, Oct. 2017.
Multi-channel methods for estimating the late reverberation power spectral density (PSD) generally assume that the reverberant PSD matrix can be decomposed as the sum of a rank-1 matrix and a scaled diffuse coherence matrix. To account for modeling or estimation errors in the estimated reverberant PSD matrix, in this paper we propose to decompose this matrix as the sum of a low rank (not necessarily rank-1) matrix and a scaled diffuse coherence matrix. Among all pairs of scalars and matrices that yield feasible decompositions, the late reverberation PSD can then be estimated as the scalar associated with the matrix of minimum rank. Since rank minimization is an intractable non-convex optimization problem, we propose to use a convex relaxation approach and estimate the late reverberation PSD based on nuclear norm minimization (NNM). Experimental results show the advantages of using the proposed NNM-based late reverberation PSD estimator in a multi-channel Wiener filter for speech dereverberation, significantly outperforming a state-of-the-art maximum likelihood-based PSD estimator and yielding a similar or better performance than a recently proposed eigenvalue decomposition-based PSD estimator.
Dereverberation in Acoustic Sensor Networks using Weighted Prediction Error with Microphone-dependent Prediction Delays
Anselm Lohmann, Toon van Waterschoot, Joerg Bitzer, Simon Doclo
In the last decades several multi-microphone speech dereverberation algorithms have been proposed, among which the weighted prediction error (WPE) algorithm. In the WPE algorithm, a prediction delay is required to reduce the correlation between the prediction signals and the direct component in the reference microphone signal. In compact arrays with closely-spaced microphones, the prediction delay is often chosen microphone-independent. In acoustic sensor networks with spatially distributed microphones, large time-differences-of-arrival (TDOAs) of the speech source between the reference microphone and other microphones typically occur. Hence, when using a microphone-independent prediction delay the reference and prediction signals may still be significantly correlated, leading to distortion in the dereverberated output signal. In order to decorrelate the signals, in this paper we propose to apply TDOA compensation with respect to the reference microphone, resulting in microphone-dependent prediction delays for the WPE algorithm. We consider both optimal TDOA compensation using crossband filtering in the short-time Fourier transform domain as well as band-to-band and integer delay approximations. Simulation results for different reverberation times using oracle as well as estimated TDOAs clearly show the benefit of using microphone-dependent prediction delays.