Temporal integration in sound texture perception

Richard McWalter and Josh McDermott
Technical University of Denmark

Many aspects of auditory perception require integrating sound information over time, but the underlying mechanisms remain poorly understood. The perception of sound textures – relatively stationary signals resulting from large numbers of acoustic events, as in rain, fire, or swarms of insects – present a potentially simple but largely unstudied example of integration. Prior research suggests that textures are represented with statistics that capture their average acoustic properties, raising the question of the window over which the averaging occurs. We probed the averaging process underlying texture perception using “texture gradients” – signals synthesized such that their statistical properties changed gradually over time. We hoped to measure how far back in time the stimulus history would exert biasing effects on texture judgments. The statistical synthesis method of McDermott and Simoncelli (2011) was extended to allow the synthesis of texture gradients that gradually morphed from one set of statistics to another. On each trial, subjects heard a texture gradient followed by a probe texture with constant statistical properties that varied across trials. Subjects were instructed to compare the end of the gradient stimulus to the probe, and to determine which was closer to a standard reference texture. We compared performance for gradients that started or ended at different points in the space of statistics, and for gradients that concluded with a constant segment of different durations. Because subjects were instructed to base their judgments on the end of the stimulus, we expected the gradient to bias judgments only if it fell within the averaging window. For each type of gradient we obtained a psychometric function (performance vs. the position of the probe statistics). Psychometric functions differed for gradients that ended at the same point in the space of statistics but that had different starting points, indicating that the stimulus history was included in subjects’ estimates of the texture statistics. The biasing effects were also present for gradients that concluded with a one-second segment of constant statistics, but were not evident when the concluding constant segment was extended to 2.5 seconds. The results suggest that texture perception is based on statistics that are averaged over windows greater than 1 second and less than 2.5 seconds in length. Future work will compare human performance to that of ideal observer models on the same task.

Webmaiuksterjun (kaxwxshtja.al1warnken@uol.deht) (Stand: 07.11.2019)