Interactive Dual Attention Conformer With Scene-Based Mask For Soft Sound Event Detection

Author:

Han Yin, Jisheng Bai, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

Keyword:

Electrical Engineering and Systems Science, Audio and Speech Processing, Audio and Speech Processing (eess.AS)

journal:

date:

2023-11-23 00:00:00

Abstract

The emergence of soft-labeled data for sound event detection (SED) effectively overcomes the lack of traditional strong-labeled data. However, the performance of present SED systems based on such soft labels is still unsatisfactory. In this work, we introduce a dual-branch SED model designed to leverage the information within soft labels. Four variations of the interacted convolutional module are presented to investigate the effective mechanism for information interaction. Furthermore, we incorporate the scene-based mask generated by an estimator to directly apply to the prediction of SED models. Experimental results show that the mask estimator can achieve comparable or even better performance than the manually-designed mask and significantly improve the performance of SED. The proposed approach achieved the top ranking in the DCASE 2023 Task4B Challenge.

PDF: Interactive Dual Attention Conformer With Scene-Based Mask For Soft Sound Event Detection.pdf