论文标题

CTCBERT:使用CTC目标推进隐藏单位的BERT

CTCBERT: Advancing Hidden-unit BERT with CTC Objectives

论文作者

Fan, Ruchao, Wang, Yiming, Gaur, Yashesh, Li, Jinyu

论文摘要

在这项工作中,我们提出了一种简单但有效的方法,Ctcbert用于推进隐藏单位的伯特(Hubert)。休伯特采用框架级跨凝结(CE)损失,这与大多数声学模型训练相似。但是,在删除每个蒙版区域中重复的ID后,CTCBERT通过连接派时间分类(CTC)目标进行模型训练。这个想法源于这样的观察,即使用群集或对齐ID时,对齐方式可能存在明显的误差。 CTC隐含地学习一致性,表明在存在未对准时,使用CTC学习可以更灵活。我们检查了Hubert Iter1,Hubert Iter2和Pbert的IDS的CTCBERT。与CE培训相比,CTC培训带来了一致的改进。此外,在捕获填充期间加载与空白相关的参数时,会观察到略有改进。在Librispeech 960-100H设置上进行了评估,CTCBERT的相对改善比Hubert的相对改善为2%-11%,而在测试中的数据上的相对改善为2%。

In this work, we present a simple but effective method, CTCBERT, for advancing hidden-unit BERT (HuBERT). HuBERT applies a frame-level cross-entropy (CE) loss, which is similar to most acoustic model training. However, CTCBERT performs the model training with the Connectionist Temporal Classification (CTC) objective after removing duplicated IDs in each masked region. The idea stems from the observation that there can be significant errors in alignments when using clustered or aligned IDs. CTC learns alignments implicitly, indicating that learning with CTC can be more flexible when misalignment exists. We examine CTCBERT on IDs from HuBERT Iter1, HuBERT Iter2, and PBERT. The CTC training brings consistent improvements compared to the CE training. Furthermore, when loading blank-related parameters during finetuning, slight improvements are observed. Evaluated on the Librispeech 960-100h setting, the relative WER improvements of CTCBERT are 2%-11% over HuBERT and PERT on test-other data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源