在Twitter上识别可能的谣言播放器：一种薄弱的监督学习方法

论文标题

在Twitter上识别可能的谣言播放器：一种薄弱的监督学习方法

Identifying Possible Rumor Spreaders on Twitter: A Weak Supervised Learning Approach

论文作者

Sharma, Shakshi, Sharma, Rajesh

论文摘要

在线社交媒体（OSM）平台（例如Twitter，Facebook）被这些平台的用户广泛利用，以便以快速的速度将（MIS）信息传播给大型受众。已经观察到，错误信息会给社会造成恐慌，恐惧和经济损失。因此，重要的是要检测和控制此类平台中的错误信息，然后再扩展到群众。在这项工作中，我们专注于谣言，这是一种错误信息（其他类型是假新闻，骗局等）。控制谣言传播的一种方法是确定可能是谣言宣传者的用户，即通常参与传播谣言的用户。由于缺乏标有数据集的谣言播放器（这是一项昂贵的任务），我们使用公开可用的pheme数据集，其中包含谣言和非鲁姆tweet tweet信息，然后应用弱监督的学习方法将pheme数据集转换为谣言传播者数据集。在应用各种监督学习方法之前，我们使用三种类型的功能，即用户，文本和自我网络功能。特别是，要利用此数据集中的固有网络属性（用户回复图），我们探索了图形卷积网络（GCN），一种类型的图形神经网络（GNN）技术。我们将GCN结果与其他方法进行比较：SVM，RF和LSTM。在谣言播放器数据集上进行的广泛实验，在该数据集中，我们实现了F1得分的0.864值，而AUC-ROC的值为0.720值，它显示了我们方法学对于使用GCN技术识别可能的谣言散布者的有效性。

Online Social Media (OSM) platforms such as Twitter, Facebook are extensively exploited by the users of these platforms for spreading the (mis)information to a large audience effortlessly at a rapid pace. It has been observed that the misinformation can cause panic, fear, and financial loss to society. Thus, it is important to detect and control the misinformation in such platforms before it spreads to the masses. In this work, we focus on rumors, which is one type of misinformation (other types are fake news, hoaxes, etc). One way to control the spread of the rumors is by identifying users who are possibly the rumor spreaders, that is, users who are often involved in spreading the rumors. Due to the lack of availability of rumor spreaders labeled dataset (which is an expensive task), we use publicly available PHEME dataset, which contains rumor and non-rumor tweets information, and then apply a weak supervised learning approach to transform the PHEME dataset into rumor spreaders dataset. We utilize three types of features, that is, user, text, and ego-network features, before applying various supervised learning approaches. In particular, to exploit the inherent network property in this dataset (user-user reply graph), we explore Graph Convolutional Network (GCN), a type of Graph Neural Network (GNN) technique. We compare GCN results with the other approaches: SVM, RF, and LSTM. Extensive experiments performed on the rumor spreaders dataset, where we achieve up to 0.864 value for F1-Score and 0.720 value for AUC-ROC, shows the effectiveness of our methodology for identifying possible rumor spreaders using the GCN technique.

下载PDF全文

下载文献需遵守相关版权规定

论文标题