论文标题

上下文相似性比角色相似性更有价值:中文咒语检查的实证研究

Contextual Similarity is More Valuable than Character Similarity: An Empirical Study for Chinese Spell Checking

论文作者

Zhang, Ding, Li, Yinghui, Zhou, Qingyu, Ma, Shirong, Li, Yangning, Cao, Yunbo, Zheng, Hai-Tao

论文摘要

中国拼写检查(CSC)任务旨在检测和纠正中文拼写错误。最近,相关研究的重点是引入“混乱设置”的角色相似性,以增强CSC模型,而忽略了包含更丰富信息的字符上下文。为了更好地利用上下文信息,我们为CSC任务提出了一个简单而有效的课程学习(CL)框架。借助我们模型不合时宜的CL框架,随着人类学习汉字并实现进一步的性能改进,现有的CSC模型将从易于到困难进行培训。对广泛使用的Sighan数据集进行了广泛的实验和详细分析表明,我们的方法的表现优于先前的最新方法。我们的研究更具启发性地表明,对于CSC任务而言,上下文相似性比角色相似性更有价值。

Chinese Spell Checking (CSC) task aims to detect and correct Chinese spelling errors. Recently, related researches focus on introducing character similarity from confusion set to enhance the CSC models, ignoring the context of characters that contain richer information. To make better use of contextual information, we propose a simple yet effective Curriculum Learning (CL) framework for the CSC task. With the help of our model-agnostic CL framework, existing CSC models will be trained from easy to difficult as humans learn Chinese characters and achieve further performance improvements. Extensive experiments and detailed analyses on widely used SIGHAN datasets show that our method outperforms previous state-of-the-art methods. More instructively, our study empirically suggests that contextual similarity is more valuable than character similarity for the CSC task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源