论文标题

通过自回归模型和改进的评估度量迈向值得信赖的音素边界检测

Towards trustworthy phoneme boundary detection with autoregressive model and improved evaluation metric

论文作者

Kim, Hyeongju, Choi, Hyeong-Seok

论文摘要

由于其在各种语音应用中的中心作用,因此已经研究了音素边界检测。在这项工作中,我们指出,不仅需要通过算法方式解决此任务,而且还需要通过评估指标来解决。为此,我们首先提出了一个最先进的音素边界检测器,该检测器以自回归方式运行,称为Superseg。与现有模型相比,PINIT和Buckeye Corpora上的实验表明,Superseg识别具有显着余量的音素边界。此外,我们注意到流行的评估指标,R值有一个限制,并提出了新的评估指标,以防止每个边界多次贡献评估。提出的指标揭示了非自动回归基准的弱点,并建立了可靠的标准,适合评估音素边界检测。

Phoneme boundary detection has been studied due to its central role in various speech applications. In this work, we point out that this task needs to be addressed not only by algorithmic way, but also by evaluation metric. To this end, we first propose a state-of-the-art phoneme boundary detector that operates in an autoregressive manner, dubbed SuperSeg. Experiments on the TIMIT and Buckeye corpora demonstrates that SuperSeg identifies phoneme boundaries with significant margin compared to existing models. Furthermore, we note that there is a limitation on the popular evaluation metric, R-value, and propose new evaluation metrics that prevent each boundary from contributing to evaluation multiple times. The proposed metrics reveal the weaknesses of non-autoregressive baselines and establishes a reliable criterion that suits for evaluating phoneme boundary detection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源