论文标题

语法误差校正的采矿误差模板

Mining Error Templates for Grammatical Error Correction

论文作者

Zhang, Yue, Jiang, Haochen, Bao, Zuyi, Zhang, Bo, Li, Chen, Li, Zhenghua

论文摘要

一些语法误差校正(GEC)系统结合了手工制作的规则并获得积极的结果。但是,手动定义规则是耗时且费力的。鉴于此,我们提出了一种方法来自动开采GEC的错误模板。错误模板是旨在识别文本错误的正则表达式。我们使用Web搜寻器从Internet获取此类错误模板。对于每个模板,我们通过使用语言模型困惑作为标准进一步选择相应的纠正措施。基于此方法,我们为中国GEC积累了1,119个错误模板。新提出的CTC-2021中国GEC基准的实验结果表明,梳理我们的误差模板可以有效地改善强GEC系统的性能,尤其是在两种错误类型的情况下,培训数据很少。我们的错误模板可在\ url {https://github.com/hillzhang1999/gec_error_template}获得。

Some grammatical error correction (GEC) systems incorporate hand-crafted rules and achieve positive results. However, manually defining rules is time-consuming and laborious. In view of this, we propose a method to mine error templates for GEC automatically. An error template is a regular expression aiming at identifying text errors. We use the web crawler to acquire such error templates from the Internet. For each template, we further select the corresponding corrective action by using the language model perplexity as a criterion. We have accumulated 1,119 error templates for Chinese GEC based on this method. Experimental results on the newly proposed CTC-2021 Chinese GEC benchmark show that combing our error templates can effectively improve the performance of a strong GEC system, especially on two error types with very little training data. Our error templates are available at \url{https://github.com/HillZhang1999/gec_error_template}.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源