WMT22多语言机器翻译任务的Volctrans系统

论文标题

WMT22多语言机器翻译任务的Volctrans系统

The VolcTrans System for WMT22 Multilingual Machine Translation Task

论文作者

Qian, Xian, Hu, Kai, Wang, Jiaqiang, Liu, Yifeng, Pan, Xingyuan, Cao, Jun, Wang, Mingxuan

论文摘要

本报告描述了我们在大规模多语言机器翻译上共享任务的Volctrans系统。我们参加了允许使用外部资源的无约束轨道。我们的系统是一个基于变压器的多语言模型，该模型对来自多个来源的数据培训，包括来自数据轨道的公共培训，Meta AI提供的NLLB数据，自收集的并行平行Corpora和pseudo bitext，以及背面翻译的pseudo bitext。一系列启发式规则清洁双语和单语文本。在官方测试集中，我们的系统在所有语言对上平均达到17.3 BLEU，21.9 SPBLEU和41.9 CHRF2 ++。使用单个NVIDIA TESLA V100 GPU，平均推理速度为每秒11.5个句子。我们的代码和训练有素的模型可在https://github.com/xian8/wmt22上找到

This report describes our VolcTrans system for the WMT22 shared task on large-scale multilingual machine translation. We participated in the unconstrained track which allows the use of external resources. Our system is a transformerbased multilingual model trained on data from multiple sources including the public training set from the data track, NLLB data provided by Meta AI, self-collected parallel corpora, and pseudo bitext from back-translation. A series of heuristic rules clean both bilingual and monolingual texts. On the official test set, our system achieves 17.3 BLEU, 21.9 spBLEU, and 41.9 chrF2++ on average over all language pairs. The average inference speed is 11.5 sentences per second using a single Nvidia Tesla V100 GPU. Our code and trained models are available at https://github.com/xian8/wmt22

下载PDF全文

下载文献需遵守相关版权规定

论文标题