“整个单词对中文伯特总是更好吗？”：探测中文语法错误校正

论文标题

“整个单词对中文伯特总是更好吗？”：探测中文语法错误校正

"Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction

论文作者

Dai, Yong, Li, Linyang, Zhou, Cong, Feng, Zhangyin, Zhao, Enbo, Qiu, Xipeng, Li, Piji, Tang, Duyu

论文摘要

整个单词蒙版（WWM），它立即掩盖了与单词相对应的所有子字，这是一个更好的英语bert模型。但是，对于中文，没有子字，因为每个令牌都是原子特征。中文中单词的含义不同，因为一个单词是由多个字符组成的组成单元。这种差异促使我们调查WWM是否导致更好的背景理解中国BERT的能力。为了实现这一目标，我们介绍了两个与语法误差校正有关的探测任务，并要求经过验证的模型以掩盖的语言建模方式修改或插入令牌。我们在10,448个句子中构建了一个数据集，其中包括19,075个令牌的标签。我们分别使用标准角色级掩蔽（CLM），WWM和CLM和WWM组合训练三个中国BERT模型。我们的主要发现如下：首先，当需要插入或更换一个角色时，经过CLM训练的模型表现最好。其次，当需要处理多个角色时，WWM是更好性能的关键。最后，当对句子级下的下游任务进行微调时，经过不同掩盖策略训练的模型表现出色。

Whole word masking (WWM), which masks all subwords corresponding to a word at once, makes a better English BERT model. For the Chinese language, however, there is no subword because each token is an atomic character. The meaning of a word in Chinese is different in that a word is a compositional unit consisting of multiple characters. Such difference motivates us to investigate whether WWM leads to better context understanding ability for Chinese BERT. To achieve this, we introduce two probing tasks related to grammatical error correction and ask pretrained models to revise or insert tokens in a masked language modeling manner. We construct a dataset including labels for 19,075 tokens in 10,448 sentences. We train three Chinese BERT models with standard character-level masking (CLM), WWM, and a combination of CLM and WWM, respectively. Our major findings are as follows: First, when one character needs to be inserted or replaced, the model trained with CLM performs the best. Second, when more than one character needs to be handled, WWM is the key to better performance. Finally, when being fine-tuned on sentence-level downstream tasks, models trained with different masking strategies perform comparably.

下载PDF全文

下载文献需遵守相关版权规定

论文标题