论文标题

AppTek提交给IWSLT 2022等轴测语言翻译任务

AppTek's Submission to the IWSLT 2022 Isometric Spoken Language Translation Task

论文作者

Wilken, Patrick, Matusov, Evgeny

论文摘要

为了参与IWSLT 2022评估的等距口语翻译任务,APPTEK开发了基于神经变形金刚的英语系统,用于英语对德语,具有各种长度控制机制,范围从源端和目标侧伪tokens和目标侧伪tokens到替代位置代替位置的字符中编码长度的长度。我们通过从不同系统变体中选择符合长度的假设的句子级选择,并从单个系统中重新夺回n-最佳候选者,从而提高了翻译长度的符合性。符合长度符合长度的后翻译和正向翻译的合成数据,以及来自原始必不可少的C训练语料库得出的其他并行数据变体对于高质量/期望的长度折衷很重要。我们的实验结果表明,可以达到高于90%的长度依从性水平,同时最大程度地减少BERT和BLEU评分中测得的MT质量损失。

To participate in the Isometric Spoken Language Translation Task of the IWSLT 2022 evaluation, constrained condition, AppTek developed neural Transformer-based systems for English-to-German with various mechanisms of length control, ranging from source-side and target-side pseudo-tokens to encoding of remaining length in characters that replaces positional encoding. We further increased translation length compliance by sentence-level selection of length-compliant hypotheses from different system variants, as well as rescoring of N-best candidates from a single system. Length-compliant back-translated and forward-translated synthetic data, as well as other parallel data variants derived from the original MuST-C training corpus were important for a good quality/desired length trade-off. Our experimental results show that length compliance levels above 90% can be reached while minimizing losses in MT quality as measured in BERT and BLEU scores.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源