论文标题

使用多语言变压器和自动翻译来改善对非英语推文的情感分析

Improving Sentiment Analysis over non-English Tweets using Multilingual Transformers and Automatic Translation for Data-Augmentation

论文作者

Barriere, Valentin, Balahur, Alexandra

论文摘要

与一般文本相比,推文是特定的文本数据。尽管对推文的情感分析在过去十年中的英语中已经非常流行,但对于非英语语言来说,仍然很难找到大量的带注释的语料库。自然语言处理中变压器模型的最新兴起允许在许多任务中实现无与伦比的性能,但是这些模型需要一定数量的文本来适应推文域。我们建议使用多种语言变压器模型,该模型在英语推文上预先培训,并使用自动翻译应用数据来启发数据,以使模型适应非英语语言。我们在法语,西班牙语,德语和意大利语中进行的实验表明,提出的技术是一种有效的方法,可以用非英语语言改善变形金刚通过小型推文的结果。

Tweets are specific text data when compared to general text. Although sentiment analysis over tweets has become very popular in the last decade for English, it is still difficult to find huge annotated corpora for non-English languages. The recent rise of the transformer models in Natural Language Processing allows to achieve unparalleled performances in many tasks, but these models need a consequent quantity of text to adapt to the tweet domain. We propose the use of a multilingual transformer model, that we pre-train over English tweets and apply data-augmentation using automatic translation to adapt the model to non-English languages. Our experiments in French, Spanish, German and Italian suggest that the proposed technique is an efficient way to improve the results of the transformers over small corpora of tweets in a non-English language.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源