法语模型对抽象性对话摘要任务的有效性

论文标题

法语模型对抽象性对话摘要任务的有效性

Effectiveness of French Language Models on Abstractive Dialogue Summarization Task

论文作者

Zhou, Yongxin, Portet, François, Ringeval, Fabien

论文摘要

预先训练的语言模型已经建立了有关各种自然语言处理任务的最新技术，包括对话摘要，这使读者可以在会议，访谈或电话中的长时间对话中快速访问关键信息。但是，这种对话仍然很难使用当前模型来处理，因为语言的自发性涉及很少存在用于预先培训语言模型的语料库中的表达式。此外，在这一领域完成的绝大多数工作都集中在英语上。在这项工作中，我们介绍了一项研究，使用几种语言的预培训模型：Barthez和Belgpt-2以及多语言的预训练模型：MBART，MBARTHEZ和MT5。实验是在Decoda（呼叫中心）对话语料库上进行的，其任务是根据情况在呼叫中心与一个或几个代理之间的呼叫中心对话中产生抽象介绍。结果表明，Barthez型号的性能最佳，远远超过了Decoda上先前的最新性能。我们进一步讨论了此类预训练模型的局限性以及总结自发对话所需的挑战。

Pre-trained language models have established the state-of-the-art on various natural language processing tasks, including dialogue summarization, which allows the reader to quickly access key information from long conversations in meetings, interviews or phone calls. However, such dialogues are still difficult to handle with current models because the spontaneity of the language involves expressions that are rarely present in the corpora used for pre-training the language models. Moreover, the vast majority of the work accomplished in this field has been focused on English. In this work, we present a study on the summarization of spontaneous oral dialogues in French using several language specific pre-trained models: BARThez, and BelGPT-2, as well as multilingual pre-trained models: mBART, mBARThez, and mT5. Experiments were performed on the DECODA (Call Center) dialogue corpus whose task is to generate abstractive synopses from call center conversations between a caller and one or several agents depending on the situation. Results show that the BARThez models offer the best performance far above the previous state-of-the-art on DECODA. We further discuss the limits of such pre-trained models and the challenges that must be addressed for summarizing spontaneous dialogues.

下载PDF全文

下载文献需遵守相关版权规定

论文标题