Exploring transfer learning for pathological speech feature prediction: Impact of layer selection

Author:

Daniela A. Wiepert, Rene L. Utianski, Joseph R. Duffy, John L. Stricker, Leland R. Barnard, David T. Jones, Hugo Botha

Keyword:

Electrical Engineering and Systems Science, Audio and Speech Processing, Audio and Speech Processing (eess.AS), Computation and Language (cs.CL), Machine Learning (cs.LG)

journal:

date:

2024-02-02 00:00:00

Abstract

There is interest in leveraging AI to conduct automatic, objective assessments of clinical speech, in turn facilitating diagnosis and treatment of speech disorders. We explore transfer learning, focusing on the impact of layer selection, for the downstream task of predicting the presence of pathological speech. We find that selecting an optimal layer offers large performance improvements (12.4% average increase in balanced accuracy), though the best layer varies by predicted feature and does not always generalize well to unseen data. A learned weighted sum offers comparable performance to the average best layer in-distribution and has better generalization for out-of-distribution data.

PDF: Exploring transfer learning for pathological speech feature prediction: Impact of layer selection.pdf