端到端文档级神经话语解析器利用多范围表示

论文标题

端到端文档级神经话语解析器利用多范围表示

An End-to-End Document-Level Neural Discourse Parser Exploiting Multi-Granularity Representations

论文作者

Shi, Ke, Liu, Zhengyuan, Chen, Nancy F.

论文摘要

根据修辞结构理论（RST），文档级的话语解析仍然具有挑战性。挑战包括文档级话语树的深层结构，对语义判断的要求以及缺乏大规模培训语料库的要求。为了应对此类挑战，我们建议利用跨语法和语义跨多个粒度得出的强大表示形式，进而将这些表示形式纳入端到端的编码器 - 码头神经架构中，以进行更足智多谋的话语处理。特别是，我们首先使用预先训练的上下文语言模型，该模型体现了高阶和远程依赖性，以实现更详细的语义，句法和组织表示。我们进一步用边界和层次信息编码此类表示，以获取文档级话语处理的更精致的建模。实验结果表明，我们的解析器实现了最先进的性能，在基准的RST数据集中接近人类水平的性能。

Document-level discourse parsing, in accordance with the Rhetorical Structure Theory (RST), remains notoriously challenging. Challenges include the deep structure of document-level discourse trees, the requirement of subtle semantic judgments, and the lack of large-scale training corpora. To address such challenges, we propose to exploit robust representations derived from multiple levels of granularity across syntax and semantics, and in turn incorporate such representations in an end-to-end encoder-decoder neural architecture for more resourceful discourse processing. In particular, we first use a pre-trained contextual language model that embodies high-order and long-range dependency to enable finer-grain semantic, syntactic, and organizational representations. We further encode such representations with boundary and hierarchical information to obtain more refined modeling for document-level discourse processing. Experimental results show that our parser achieves the state-of-the-art performance, approaching human-level performance on the benchmarked RST dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题