神经主题建模比聚类更好吗？一项关于聚类的实证研究，以及主题的上下文嵌入

论文标题

神经主题建模比聚类更好吗？一项关于聚类的实证研究，以及主题的上下文嵌入

Is Neural Topic Modelling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics

论文作者

Zhang, Zihan, Fang, Meng, Chen, Ling, Namazi-Rad, Mohammad-Reza

论文摘要

最近的工作结合了预训练的单词嵌入，例如BERT嵌入到神经主题模型（NTMS）中，产生了高度连贯的主题。但是，借助高质量的文档表示，我们是否真的需要复杂的神经模型来获得连贯且可解释的主题？在本文中，我们进行了彻底的实验，表明直接将高质量的句子嵌入使用适当的单词选择方法可以产生比NTM的更连贯和多样化的主题，从而达到更高的效率和简单性。

Recent work incorporates pre-trained word embeddings such as BERT embeddings into Neural Topic Models (NTMs), generating highly coherent topics. However, with high-quality contextualized document representations, do we really need sophisticated neural models to obtain coherent and interpretable topics? In this paper, we conduct thorough experiments showing that directly clustering high-quality sentence embeddings with an appropriate word selecting method can generate more coherent and diverse topics than NTMs, achieving also higher efficiency and simplicity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题