论文标题
Liir在2020 Semeval-2020任务12:一种用于多语言进攻语言标识的跨语化增强方法
LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for Multilingual Offensive Language Identification
论文作者
论文摘要
本文介绍了我们的系统,标题为“ Semeval-2020”任务12在社交媒体中的多语言攻击语言识别(攻击2)中。我们参加了英语,丹麦,希腊语,阿拉伯语和土耳其语的子任务A。我们分别适用于Google AI提供的英语和非英语语言的Bert和多语言BERT模型。对于英语,我们使用两种微调的BERT模型的组合。对于其他语言,我们提出了一种跨语性的增强方法,以丰富培训数据,并使用多语言BERT获得句子表示。 Liir分别在希腊语,土耳其语,英语,阿拉伯语和丹麦语中获得了14/38、18/47、24/86、24/54和25/40的排名。
This paper presents our system entitled `LIIR' for SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2). We have participated in sub-task A for English, Danish, Greek, Arabic, and Turkish languages. We adapt and fine-tune the BERT and Multilingual Bert models made available by Google AI for English and non-English languages respectively. For the English language, we use a combination of two fine-tuned BERT models. For other languages we propose a cross-lingual augmentation approach in order to enrich training data and we use Multilingual BERT to obtain sentence representations. LIIR achieved rank 14/38, 18/47, 24/86, 24/54, and 25/40 in Greek, Turkish, English, Arabic, and Danish languages, respectively.