Workshop on "Computational Audition"



P. Smaragdis - "Exemplary" Cocktail-Party Listening

A fair amount of work on understanding sound mixtures is based on learned low-rank decompositions of spectra. Such decompositions are used to analyze mixtures, extract constituent sounds and then perform various subsequent tasks such as pitch tracking, speech recognition, etc. These tasks however are almost always hindered by the quality of the separation step which is not guaranteed to facilitate them. If instead of low-rank models we try to explain mixtures by using verbatim bits from training data, it is possible to not only improve the performance of the separation, but to also carry semantic information from these bits to the mixtures and automatically perform otherwise  complex parameter estimation tasks. I will show how such a model is easily computed as a sparse decomposition, how it can be enhanced with various priors, and how it can be used in the context of problems that include mixtures.

(Stand: 16.03.2023)  |