论文标题

话语意识到的无监督的长期科学文档的摘要

Discourse-Aware Unsupervised Summarization of Long Scientific Documents

论文作者

Dong, Yue, Mircea, Andrei, Cheung, Jackie C. K.

论文摘要

我们为长期科学文档的提取性汇总提出了一个无监督的基于图的排名模型。我们的方法假设源文档的两级分层图表示,并利用不对称的位置提示来确定句子的重要性。 PubMed和Arxiv数据集上的结果表明,我们的方法在自动指标和人类评估中的广泛边缘优于强大的无监督基线。此外,它的性能与许多针对数十万个例子进行培训的最先进的监督方法相当。这些结果表明,话语结构中的模式是确定科学文章中重要性的强烈信号。

We propose an unsupervised graph-based ranking model for extractive summarization of long scientific documents. Our method assumes a two-level hierarchical graph representation of the source document, and exploits asymmetrical positional cues to determine sentence importance. Results on the PubMed and arXiv datasets show that our approach outperforms strong unsupervised baselines by wide margins in automatic metrics and human evaluation. In addition, it achieves performance comparable to many state-of-the-art supervised approaches which are trained on hundreds of thousands of examples. These results suggest that patterns in the discourse structure are a strong signal for determining importance in scientific articles.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源