J. Barker - Probabilistic models of auditory scene analysis for robust speech recognition
This talk will present research that has been conducted at Sheffield under the EPSRC Computational Hearing in Multisource Environments (CHiME) project. One of the objectives of this project has been to explore the potential of techniques inspired by auditory scene analysis to deliver robust speech recognition in 'everyday' listening environments, that is, environments contain multiple competing sound sources mixed in reverberant environments. The talk will present an ASA-inspired approach to ASR that distinguishes between what Bregman would call `primitive grouping' and `schema driven processes'. The approach treats the foreground and background in an asymmetric manner, and will be contrasted with conventional 'model combination' techniques which operate by symmetrically decomposing the scene into a supposition of individual sources. The talk will discuss the balance between primitive grouping and schema driven processing. It will be argued that primitive grouping cues provides constraints that i/ allow the foreground to be reliably interpreted even when the background is unfamiliar, ii/ are essential in disambiguating complex scenes in which sources have similar temporal and spectral characteristics. It will also be argued that although primitive cues may be largely redundant when dealing with highly familiar or extremely simple acoustic scenes, they may play a crucial role when learning source models in real environments.