Kontakt

Leitung

Prof. Dr. Dr. Birger Kollmeier

/ -

W30 3-313

Sekretariat

Katja Warnken

+49 (0)441 798-3902

W30 3-312

Kirsten Scheel

+49 (0)441 798-3813

+49 (0)441 798-3902

W30 3-312

Anschrift (Postanschrift)

Medizinische Physik, Fakultät VI
Universität Oldenburg
26111 Oldenburg

Standort / Anreise

Für spezifische Fragen bezüglich eines unserer Forschungsthemen kontaktieren Sie bitte die entsprechenden Personen direkt (siehe Liste der Mitarbeiter)

Gabor filter bank features

Gabor filter bank (GBFB) features extract spectro-temporal information from speech signals with the aim of improving the robustness of automatic speech recognition (ASR). In [1], we proposed a Gabor filter bank, in which 2D-filters are arranged by spectral and temporal modulation frequencies in a filter bank (Figure 1). The corresponding feature extraction code can be downloaded below.

Gabor filter bank

Figure 1: 2D-Gabor filters arranged by spectral and temporal modulation frequencies. The figure shows the real values of complex filters.

Separable Gabor filter bank (SGBFB) features extract spectro-temporal patterns with two separate 1D Gabor filter banks, a spectral one and a temporal one. The relation between GBFB and SGBFB filters is illustrated in Figure 2. This approach reduces the complexity of the spectro-temporal feature extraction and further improves the robustness of ASR. The corresponding feature extraction code is available in a public repository to which a link is provided below.

Figure 2: Relation of inseparable 2D-Gabor filters to separable spectro-temporal filters based on 1D Gabor filters. When a downward 2D-Gabor filter (B) is added (C) to or subtracted (D) from the corresponding upward filter (A), the resulting filter is separable. Separable means that the 2D filter can be decomposed into independent spectral and a temporal 1D filter components (E,R,I) which can be used to perform the spectro-temporal filtering.

The SGBFB features were also used to model human (speech) perception in the Simulation Framework for Auditory Discrimination Experiments (FADE) [3].

References

  1. Schädler, M. R., Meyer, B. T., and Kollmeier, B., "Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition", Journal of the Acoustical Society of America, Volume 131, Issue 5, pp 4134-4151 (2012). [link|download]
  2. Schädler, M. R. and Kollmeier, B., "Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition", Journal of the Acoustical Society of America, Volume 137, Issue 4, pp 2047-2059 (2015). [link|download]
  3. Schädler, M. R., Warzybok, A., Ewert, S. D., and Kollmeier B. "A simulation framework for auditory discriminationexperiments: Revealing the importance of across-frequencyprocessing in speech perception", Journal of the Acoustical Society of America, Volume 139, pp 2708-2722 (2016). [link|download]

Selected references using GBFB features

  • Meyer, B., Ravuri, S., Schädler, M.R., and Morgan, N. (2011). "Comparing different flavors of spectro-temporal features for ASR", in Proc. Interspeech, pp. 1269-1272.
  • Schädler, M. R. and Kollmeier, B. (2012). "Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems", in Proc. Interspeech. [download from author's website]
  • Lei, H., Meyer, B., and Mirghafori, N. (2012). "Spectro-temporal Gabor features for speaker recognition," in Proc. ICASSP.
  • Moritz, N., Schädler, M.R., Adiloglu, K., Meyer, B., Jürgens, T., Gerkmann, T., and Goetze, S. (2013). "Noise robust distant automatic speech recognition utilizing NMF based source separation and auditory feature extraction," Workshop on Machine Listening in Multisource Environments (CHiME 2013).
  • Schröder, J., Cauchi, B., Schädler, M. R., Moritz, N., Adiloglu, K., Anemüller, K., Doclo, S., Kollmeier, and B., Goetze, S. (2013). "Acoustic Event Detection Using Signal Enhancement and Spectro-temporal Feature Extraction", in IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events, 2013.
  • Chang, S., Meyer, B., Morgan, N. (2013). "Spectro-temporal features for noise-robust speech recognition using power-law nonlinearity and power-bias subtraction," Proc. 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 7063-7067.

Selected references using SGBFB features

  •  

Licensing of the feature extraction code: Dual-License

The Gabor Filterbank (GBFB) as well as the separable Gabor filter bank (SGBFB) feature extraction code is licensed under both General Public License (GPL) version 3 and a proprietary license that can be arranged with us. In practical sense, this means:

  • If you are developing Open Source Software (OSS) based on the GBFB code, chances are you will be able to use it freely under GPL. But please double check here for OSS license compatibility with GPL.
  • Alternatively, if you are unable to release your application as Open Source Software, you may arrange alternative licensing with us. Just send your inquiry to marc.r.schaedler@uol.de to discuss this option.

Downloads

  • Original Gabor filter bank feature extraction implementation as used in [1] [download]
  • Matlab/Octave implementations of feature extraction algorithms as used in [2] [repository]
(Stand: 09.02.2024)  | 
Zum Seitananfang scrollen Scroll to the top of the page