AUDIO: MMSE-Optimal Spectral Amplitude Estimation Given the STFT-Phase
AUDIO: MMSE-Optimal Spectral Amplitude Estimation Given the STFT-Phase
Employing Phase Information for an Improved Single Channel Speech Enhancement
Most single channel noise reduction algorithms work on spectral representations of noisy speech, such as the short-time discrete Fourier transform (STFT) domain. In the STFT-domain the speech spectral coefficients are represented by their amplitudes and phases. However, state-of-the-art single channel noise reduction algorithms are phase-blind, in the sense that only the amplitudes of the noisy coefficients are improved while the noisy phase is neither changed nor employed for amplitude estimation.
In [2] we show that the clean speech spectral phase can be efficiently reconstructed both on and between speech spectral harmonics of voiced speech from an estimate of the speech fundamental frequency.
In [1] we show that the speech phase estimate can also be employed as additional information in speech amplitude estimation. For this, we derive an optimal Bayesian estimator of the clean speech spectral amplitudes when the clean speech phase is given. We show that the additional information of the phase provides new means to distinguish between noise outliers and speech. Thus, we conclude that incorporating phase processing can push the limits of single channel noise reduction algorithms beyond the limits of state-of-the-art phase-blind approaches.
See also [3] for a discussion of the role of phase in speech enhancement.
[1] Timo Gerkmann, Martin Krawczyk, "MMSE-Optimal Spectral Amplitude Estimation Given the STFT-Phase", IEEE Signal Processing Letters, Vol. 20, No. 2, pp. 129-132, Feb. 2013.
[2] Martin Krawczyk, Timo Gerkmann, "STFT Phase Improvement for Single Channel Speech Enhancement", Int. Workshop Acoust. Signal Enhancement (IWAENC), Aachen, Germany, Sep. 2012.
[3] Timo Gerkmann, Martin Krawczyk, Robert Rehr, "Phase estimation in speech enhancement - unimportant, important, or impossible?", IEEE Convention of Electrical and Electronics Engineers in Israel, Eilat, Israel, Nov. 2012.
Example
While state-of-the-art methods can reduce the stationary part of the noise signal well, the proposed approach [1] is capable of reducing noise outliers additionally. In the given example this is most audible in the word 'surely' in the first second of the example.
Full sentence
Clean Speech |
Noisy Speech |
![]() |
![]() |
No Phase Processing |
Proposed (Fully blind estimation) |
![]() |
![]() |
Excerpt
Clean Speech |
Noisy Speech |
![]() |
![]() |
No Phase Processing |
Proposed (Fully blind estimation) |
![]() |
![]() |