background
logo
ArxivPaperAI

Learning Disentangled Speech Representations

Author:
Yusuf Brima, Ulf Krumnack, Simone Pika, Gunther Heidemann
Keyword:
Electrical Engineering and Systems Science, Audio and Speech Processing, Audio and Speech Processing (eess.AS), Machine Learning (cs.LG), Sound (cs.SD)
journal:
--
date:
2023-11-03 16:00:00
Abstract
Disentangled representation learning from speech remains limited despite its importance in many application domains. A key challenge is the lack of speech datasets with known generative factors to evaluate methods. This paper proposes SynSpeech: a novel synthetic speech dataset with ground truth factors enabling research on disentangling speech representations. We plan to present a comprehensive study evaluating supervised techniques using established supervised disentanglement metrics. This benchmark dataset and framework address the gap in the rigorous evaluation of state-of-the-art disentangled speech representation learning methods. Our findings will provide insights to advance this underexplored area and enable more robust speech representations.
PDF: Learning Disentangled Speech Representations.pdf
Empowered by ChatGPT