Learning Disentangled Speech Representations

Author:

Yusuf Brima, Ulf Krumnack, Simone Pika, Gunther Heidemann

Keyword:

Electrical Engineering and Systems Science, Audio and Speech Processing, Audio and Speech Processing (eess.AS), Machine Learning (cs.LG), Sound (cs.SD)

journal:

date:

2023-11-03 16:00:00

Abstract

Disentangled representation learning from speech remains limited despite its importance in many application domains. A key challenge is the lack of speech datasets with known generative factors to evaluate methods. This paper proposes SynSpeech: a novel synthetic speech dataset with ground truth factors enabling research on disentangling speech representations. We plan to present a comprehensive study evaluating supervised techniques using established supervised disentanglement metrics. This benchmark dataset and framework address the gap in the rigorous evaluation of state-of-the-art disentangled speech representation learning methods. Our findings will provide insights to advance this underexplored area and enable more robust speech representations.

PDF: Learning Disentangled Speech Representations.pdf