论文标题
le Processus驱动的dirichlet-hawkes comme先验柔性pol clustering leturel de Textes
Le Processus Powered Dirichlet-Hawkes comme A Priori Flexible pour Clustering Temporel de Textes
论文作者
论文摘要
文档的文字内容及其出版日期交织在一起。例如,根据基本的时间动态,关于主题的新闻文章的出版受到以前的出版物的影响。但是,当文本信息很少传达时,检索有意义的信息可能会很具有挑战性。此外,文档的文本内容并不总是与其时间动态相关。我们开发了一种根据文本文档的内容和出版时间(功率为Dirichlet-Hawkes Process(PDHP))创建文本文档群的方法。当时间信息或文本内容无需提供信息时,PDHP比最先进的模型产生的结果明显好。 PDHP还减轻了以下假设:文本内容和时间动力完全相关。我们证明了PDHP概括了先前的工作 - 例如DHP及以上。最后,我们使用Reddit的现实世界数据集说明了可能的应用程序。
The textual content of a document and its publication date are intertwined. For example, the publication of a news article on a topic is influenced by previous publications on similar issues, according to underlying temporal dynamics. However, it can be challenging to retrieve meaningful information when textual information conveys little. Furthermore, the textual content of a document is not always correlated to its temporal dynamics. We develop a method to create clusters of textual documents according to both their content and publication time, the Powered Dirichlet-Hawkes process (PDHP). PDHP yields significantly better results than state-of-the-art models when temporal information or textual content is weakly informative. PDHP also alleviates the hypothesis that textual content and temporal dynamics are perfectly correlated. We demonstrate that PDHP generalizes previous work --such as DHP and UP. Finally, we illustrate a possible application using a real-world dataset from Reddit.