论文标题
通过字符发音预测改进中国拼写检查:适应性和粒度的影响
Improving Chinese Spelling Check by Character Pronunciation Prediction: The Effects of Adaptivity and Granularity
论文作者
论文摘要
中文拼写检查(CSC)是一项基本的NLP任务,可检测并纠正中文文本中的拼写错误。由于这些拼写错误大多数是由语音相似性引起的,因此有效地对汉字的发音进行建模是CSC的关键因素。在本文中,我们考虑引入中文发音预测(CPP)的辅助任务,以改善CSC,并首次系统地讨论此辅助任务的适应性和粒度。我们提出的范围是在共享编码器两个并行解码器之上构建的范围,一个用于主要CSC任务,另一个用于细粒度的辅助CPP任务,具有新颖的自适应加权方案,以平衡这两个任务。此外,我们设计了一种精致的迭代校正策略,以在推断过程中进行进一步的改进。经验评估表明,范围在三个CSC基准上实现了最新的最新技术,这表明了辅助CPP任务的有效性和优势。全面的消融研究进一步验证了任务适应性和颗粒状的积极影响。本文中使用的代码和数据可在https://github.com/jiahaozhenbang/scope上公开获取。
Chinese spelling check (CSC) is a fundamental NLP task that detects and corrects spelling errors in Chinese texts. As most of these spelling errors are caused by phonetic similarity, effectively modeling the pronunciation of Chinese characters is a key factor for CSC. In this paper, we consider introducing an auxiliary task of Chinese pronunciation prediction (CPP) to improve CSC, and, for the first time, systematically discuss the adaptivity and granularity of this auxiliary task. We propose SCOPE which builds on top of a shared encoder two parallel decoders, one for the primary CSC task and the other for a fine-grained auxiliary CPP task, with a novel adaptive weighting scheme to balance the two tasks. In addition, we design a delicate iterative correction strategy for further improvements during inference. Empirical evaluation shows that SCOPE achieves new state-of-the-art on three CSC benchmarks, demonstrating the effectiveness and superiority of the auxiliary CPP task. Comprehensive ablation studies further verify the positive effects of adaptivity and granularity of the task. Code and data used in this paper are publicly available at https://github.com/jiahaozhenbang/SCOPE.