论文标题
估计社交网络上的信息
Estimating Exposure to Information on Social Networks
论文作者
论文摘要
本文考虑了估计社交网络中信息接触的问题。鉴于一个信息(例如,在Facebook上的新闻文章,Twitter上的标签上的一个新闻文章),我们的目的是找到已接触到该网络的人的一部分。接触信息的确切价值由两个特征确定:基础社交网络的结构和共享信息的人集。通常,这两个功能都不公开(即访问这两个功能仅限于平台的内部管理员),也难以从数据中估算。作为解决方案,我们提出了两种方法,以公正的方式估算信息的暴露:一种基于统一对网络进行采样的香草方法和一种非均匀采样友谊悖论动机的网络的方法。我们提供的理论结果表征了一种方法(在网络的属性和信息方面),其中一种方法优于另一种方法。此外,我们概述了提出的方法扩展到动态信息级联(需要实时跟踪暴露)。我们通过对多个合成和现实世界数据集的实验来证明所提出的方法的实际可行性。
This paper considers the problem of estimating exposure to information in a social network. Given a piece of information (e.g., a URL of a news article on Facebook, a hashtag on Twitter), our aim is to find the fraction of people on the network who have been exposed to it. The exact value of exposure to a piece of information is determined by two features: the structure of the underlying social network and the set of people who shared the piece of information. Often, both features are not publicly available (i.e., access to the two features is limited only to the internal administrators of the platform) and difficult to be estimated from data. As a solution, we propose two methods to estimate the exposure to a piece of information in an unbiased manner: a vanilla method which is based on sampling the network uniformly and a method which non-uniformly samples the network motivated by the Friendship Paradox. We provide theoretical results which characterize the conditions (in terms of properties of the network and the piece of information) under which one method outperforms the other. Further, we outline extensions of the proposed methods to dynamic information cascades (where the exposure needs to be tracked in real-time). We demonstrate the practical feasibility of the proposed methods via experiments on multiple synthetic and real-world datasets.