论文标题

随机块模型中的嵌入式主题

Embedded Topics in the Stochastic Block Model

论文作者

Boutin, Rémi, Bouveyron, Charles, Latouche, Pierre

论文摘要

现在,电子邮件或社交网络等通信网络无处不在,它们的分析已成为战略领域。在许多应用程序中,目标是通过查看节点及其连接来自动提取相关信息。不幸的是,大多数现有方法都集中在分析边缘的存在或不存在的情况下,并且文本数据通常被丢弃。但是,所有通信网络实际上都带有边缘上的文本数据。为了考虑到这种特异性,我们在本文网络中考虑了两个节点在共享文本数据时才链接到两个节点时。我们引入了一个深层可变模型,允许处理称为ETSBM的嵌入式主题,以同时在节点上执行聚类,同时建模不同簇之间使用的主题。 ETSBM分别扩展了随机块模型(SBM)和嵌入式主题模型(ETM),它们分别是研究网络和语料库的核心模型。该推论是使用差异预期最大化算法与随机梯度下降结合的。该方法对合成数据和现实世界数据集进行了评估。

Communication networks such as emails or social networks are now ubiquitous and their analysis has become a strategic field. In many applications, the goal is to automatically extract relevant information by looking at the nodes and their connections. Unfortunately, most of the existing methods focus on analysing the presence or absence of edges and textual data is often discarded. However, all communication networks actually come with textual data on the edges. In order to take into account this specificity, we consider in this paper networks for which two nodes are linked if and only if they share textual data. We introduce a deep latent variable model allowing embedded topics to be handled called ETSBM to simultaneously perform clustering on the nodes while modelling the topics used between the different clusters. ETSBM extends both the stochastic block model (SBM) and the embedded topic model (ETM) which are core models for studying networks and corpora, respectively. The inference is done using a variational-Bayes expectation-maximisation algorithm combined with a stochastic gradient descent. The methodology is evaluated on synthetic data and on a real world dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源