Simlda：主题模型评估工具

论文标题

Simlda：主题模型评估工具

SimLDA: A tool for topic model evaluation

论文作者

Taylor, Rebecca M. C., Preez, Johan A. du

论文摘要

应用于潜在的Dirichlet分配（LDA）的变异贝叶斯（VB）已成为方面建模最受欢迎的算法。尽管从大型语料库中提取文本主题方面取得了足够的成功，但VB在识别有限数据的情况下识别方面的成功较少。我们提出了一个新颖的变分信息，传递了用于潜在的DIRICHLET分配（LDA）的算法，并将其与金标准VB进行比较并崩溃的Gibbs采样。在边缘化导致非混合消息的情况下，我们使用采样的想法来得出近似更新方程。在共轭含量成立的情况下，使用Loopopy Chaild Update（LBU）（也称为Lauritzen-Spiegelhalter）。我们的算法Albu（近似LBU）与变异消息传递（VMP）具有很强的相似性（这是VB的消息传递变体）。为了比较在有限的数据存在下算法的性能，我们使用由推文和新闻组组成的数据集。使用相干度量，我们表明Albu比VB更准确地学习潜在分布，尤其是对于较小的数据集。

Variational Bayes (VB) applied to latent Dirichlet allocation (LDA) has become the most popular algorithm for aspect modeling. While sufficiently successful in text topic extraction from large corpora, VB is less successful in identifying aspects in the presence of limited data. We present a novel variational message passing algorithm as applied to Latent Dirichlet Allocation (LDA) and compare it with the gold standard VB and collapsed Gibbs sampling. In situations where marginalisation leads to non-conjugate messages, we use ideas from sampling to derive approximate update equations. In cases where conjugacy holds, Loopy Belief update (LBU) (also known as Lauritzen-Spiegelhalter) is used. Our algorithm, ALBU (approximate LBU), has strong similarities with Variational Message Passing (VMP) (which is the message passing variant of VB). To compare the performance of the algorithms in the presence of limited data, we use data sets consisting of tweets and news groups. Using coherence measures we show that ALBU learns latent distributions more accurately than does VB, especially for smaller data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题