论文标题
学习不变的表示和风险最小
Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation
论文作者
论文摘要
语音音频的无监督代表性学习在语音识别任务中获得了令人印象深刻的表演,尤其是当注释的语音受到限制时。但是,需要仔细设计的无监督范式,并且对这些表示的属性知之甚少。不能保证该模型会学习有意义的表示形式,以获取有价值的信息以供识别。此外,仍需要估算到达其他域的适应能力。在这项工作中,我们通过将语音表示形式直接映射到其相应的高级语言信息来探讨学习域不变的表示。结果证明,学识渊博的潜伏期不仅捕获了每个音素的发音特征,而且还提高了适应能力,在很大程度上优于重音基准的基线。
Unsupervised representation learning for speech audios attained impressive performances for speech recognition tasks, particularly when annotated speech is limited. However, the unsupervised paradigm needs to be carefully designed and little is known about what properties these representations acquire. There is no guarantee that the model learns meaningful representations for valuable information for recognition. Moreover, the adaptation ability of the learned representations to other domains still needs to be estimated. In this work, we explore learning domain-invariant representations via a direct mapping of speech representations to their corresponding high-level linguistic informations. Results prove that the learned latents not only capture the articulatory feature of each phoneme but also enhance the adaptation ability, outperforming the baseline largely on accented benchmarks.