论文标题
随机图模型的错误指标
Misspecification Tests on Models of Random Graphs
论文作者
论文摘要
一类已广泛使用的模型是指数随机图(ERG)模型,这些模型构成了一个全面的模型家族,其中包括独立和二元边缘模型,马尔可夫随机图和许多其他图形分布,此外还允许包含可以带来更好拟合模型的协变量。 统计网络分析中的另一类模型类模型是随机块模型(SBM)。它们可用于将节点分组为社区或发现和分析网络的潜在结构。随机块模型是随机图的生成模型,倾向于产生包含以相互连接(称为群落连接)的节点子集的图形。 许多来自各个领域的研究人员一直在使用计算工具来调整这些模型,而没有分析其对他们正在研究的网络数据的适用性。这些模型的估计过程和拟合优点验证方法中涉及的复杂性可能是使对充分性的分析变得困难的因素,并且可能丢弃一个模型,以支持另一个模型。 显然,通过不适当模型获得的结果可以使研究人员对所研究现象得出非常错误的结论。 这项工作的目的是基于假设检验提出一种简单的方法,以验证文献中广泛使用的这两种案例是否存在模型规范误差来表示复杂的网络:ERGM和SBM。我们认为,对于那些想以更仔细的方式使用这些模型的人来说,此工具非常有用,如果模型适合研究的数据,则事先验证。
A class of models that have been widely used are the exponential random graph (ERG) models, which form a comprehensive family of models that include independent and dyadic edge models, Markov random graphs, and many other graph distributions, in addition to allow the inclusion of covariates that can lead to a better fit of the model. Another increasingly popular class of models in statistical network analysis are stochastic block models (SBMs). They can be used for the purpose of grouping nodes into communities or discovering and analyzing a latent structure of a network. The stochastic block model is a generative model for random graphs that tends to produce graphs containing subsets of nodes characterized by being connected to each other, called communities. Many researchers from various areas have been using computational tools to adjust these models without, however, analyzing their suitability for the data of the networks they are studying. The complexity involved in the estimation process and in the goodness-of-fit verification methodologies for these models can be factors that make the analysis of adequacy difficult and a possible discard of one model in favor of another. And it is clear that the results obtained through an inappropriate model can lead the researcher to very wrong conclusions about the phenomenon studied. The purpose of this work is to present a simple methodology, based on Hypothesis Tests, to verify if there is a model specification error for these two cases widely used in the literature to represent complex networks: the ERGM and the SBM. We believe that this tool can be very useful for those who want to use these models in a more careful way, verifying beforehand if the models are suitable for the data under study.