论文标题

WMT22多语言机器翻译任务的Volctrans系统

The VolcTrans System for WMT22 Multilingual Machine Translation Task

论文作者

Qian, Xian, Hu, Kai, Wang, Jiaqiang, Liu, Yifeng, Pan, Xingyuan, Cao, Jun, Wang, Mingxuan

论文摘要

本报告描述了我们在大规模多语言机器翻译上共享任务的Volctrans系统。我们参加了允许使用外部资源的无约束轨道。我们的系统是一个基于变压器的多语言模型,该模型对来自多个来源的数据培训,包括来自数据轨道的公共培训,Meta AI提供的NLLB数据,自收集的并行平行Corpora和pseudo bitext,以及背面翻译的pseudo bitext。一系列启发式规则清洁双语和单语文本。在官方测试集中,我们的系统在所有语言对上平均达到17.3 BLEU,21.9 SPBLEU和41.9 CHRF2 ++。使用单个NVIDIA TESLA V100 GPU,平均推理速度为每秒11.5个句子。我们的代码和训练有素的模型可在https://github.com/xian8/wmt22上找到

This report describes our VolcTrans system for the WMT22 shared task on large-scale multilingual machine translation. We participated in the unconstrained track which allows the use of external resources. Our system is a transformerbased multilingual model trained on data from multiple sources including the public training set from the data track, NLLB data provided by Meta AI, self-collected parallel corpora, and pseudo bitext from back-translation. A series of heuristic rules clean both bilingual and monolingual texts. On the official test set, our system achieves 17.3 BLEU, 21.9 spBLEU, and 41.9 chrF2++ on average over all language pairs. The average inference speed is 11.5 sentences per second using a single Nvidia Tesla V100 GPU. Our code and trained models are available at https://github.com/xian8/wmt22

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源