Binaural multichannel blind speaker separation with a causal low-latency and low-complexity approach

Author:

Nils L. Westhausen, Bernd T. Meyer

Keyword:

Electrical Engineering and Systems Science, Audio and Speech Processing, Audio and Speech Processing (eess.AS), Sound (cs.SD)

journal:

date:

2023-12-08 00:00:00

Abstract

In this paper, we introduce a causal low-latency low-complexity approach for binaural multichannel blind speaker separation in noisy reverberant conditions. The model, referred to as Group Communication Binaural Filter and Sum Network (GCBFSnet) predicts complex filters for filter-and-sum beamforming in the time-frequency domain. We apply Group Communication (GC), i.e., latent model variables are split into groups and processed with a shared sequence model with the aim of reducing the complexity of a simple model only containing one convolutional and one recurrent module. With GC we are able to reduce the size of the model by up to 83 % and the complexity up to 73 % compared to the model without GC, while mostly retaining performance. Even for the smallest model configuration, GCBFSnet matches the performance of a low-complexity TasNet baseline in most metrics despite the larger size and higher number of required operations of the baseline.

PDF: Binaural multichannel blind speaker separation with a causal low-latency and low-complexity approach.pdf