论文标题
根据关键字属性增强主题分析
Enhance Topics Analysis based on Keywords Properties
论文作者
论文摘要
主题建模是用于探索和检索文档收集的最普遍的文本分析技术之一。主题模型算法的评估仍然是一项非常具有挑战性的任务,因为没有金标准的主题列表可以与每个语料库进行比较。在这项工作中,我们提出了基于关键字属性的特殊性分数,该分数能够选择最有用的主题。这种方法可以帮助用户专注于最有用的主题。在实验中,我们表明我们能够压缩不同因素的最新主题建模结果,其信息丢失远低于基于文献中最新的一致性得分的解决方案。
Topic Modelling is one of the most prevalent text analysis technique used to explore and retrieve collection of documents. The evaluation of the topic model algorithms is still a very challenging tasks due to the absence of gold-standard list of topics to compare against for every corpus. In this work, we present a specificity score based on keywords properties that is able to select the most informative topics. This approach helps the user to focus on the most informative topics. In the experiments, we show that we are able to compress the state-of-the-art topic modelling results of different factors with an information loss that is much lower than the solution based on the recent coherence score presented in literature.