论文标题
基于Copula的统计依赖性可视化
Copula-based statistical dependence visualizations
论文作者
论文摘要
探索性数据分析中的经常任务是检查数据变量之间的成对依赖性。流行的方法包括可视化相关或散点图矩阵。但是,这两种方法都可能具有误导性。前者主要受到限制,因为它报告了一对随机变量的单个值。此外,散点图无法正确传达变量之间的依赖性结构。在本文中,我们讨论了这些缺点,并基于副函数提出了替代性和更丰富的可视化,这充分确定了连续随机变量之间的依赖性。由于Copulas很少出现在数据可视化文献中,因此我们首先回顾了基本理论,并提出了替代散点图和几个热图,以评估两个连续的随机变量之间的统计关联。这些可视化不仅允许用户检测独立性,而且还可以通过颜色编码来增加数据的趋势和/或减少数据趋势,这也可以用其他方法(例如并行坐标)应用。
A frequent task in exploratory data analysis consists in examining pairwise dependencies between data variables. Popular approaches include visualizing correlation or scatter plot matrices. However, both methods can be misleading. The former is primarily limited because it reports a single value for a pair of random variables. Furthermore, scatter plots can fail to convey the dependency structure between variables properly. In this paper we discuss these shortcomings and present alternative and richer visualizations based on copula functions, which fully determine the dependency between continuous random variables. Since copulas seldom appear in the data visualization literature we first review essential theory, and propose alternative scatter plots and several heatmaps for assessing the statistical association between two continuous random variables. These visualizations not only allow users to detect independence, but also increasing and/or decreasing trends in the data through a color coding, which can also be applied in other methods such as parallel coordinates.