论文标题

用替换与泊松采样采样:最佳子采样中的比较研究

Sampling with replacement vs Poisson sampling: a comparative study in optimal subsampling

论文作者

Wang, Jing, Zou, Jiahui, Wang, HaiYing

论文摘要

面对大量数据,亚采样是一种提高计算效率的常用技术,使用非均匀的亚采样概率是提高估计效率的有效方法。为了计算效率,通常通过替换或泊松子采样实现子采样。但是,尚未进行严格的研究来研究两种子采样程序(例如它们的估计效率和计算便利性)之间的差异。本文对这两种不同的抽样程序进行了比较研究。在最大化一般目标函数的背景下,我们首先得出了从两个采样程序获得的估计器的渐近分布。结果表明,泊松子采样可能具有更高的估计效率。基于用于替换和泊松子采样的次采样的渐近分布,我们得出了最佳的子采样概率,从而最大程度地减少了亚采样估计器的方差函数。这些子采样概率进一步揭示了替换和泊松子采样的亚采样之间的相似性和差异。在两个子采样过程中的理论特征和比较提供了指导,以在实践中选择更合适的子采样方法。此外,实际上可实现的算法是基于最佳结构结果提出的,这些结果通过理论和经验分析进行了评估。

Faced with massive data, subsampling is a commonly used technique to improve computational efficiency, and using nonuniform subsampling probabilities is an effective approach to improve estimation efficiency. For computational efficiency, subsampling is often implemented with replacement or through Poisson subsampling. However, no rigorous investigation has been performed to study the difference between the two subsampling procedures such as their estimation efficiency and computational convenience. This paper performs a comparative study on these two different sampling procedures. In the context of maximizing a general target function, we first derive asymptotic distributions for estimators obtained from the two sampling procedures. The results show that the Poisson subsampling may have a higher estimation efficiency. Based on the asymptotic distributions for both subsampling with replacement and Poisson subsampling, we derive optimal subsampling probabilities that minimize the variance functions of the subsampling estimators. These subsampling probabilities further reveal the similarities and differences between subsampling with replacement and Poisson subsampling. The theoretical characterizations and comparisons on the two subsampling procedures provide guidance to select a more appropriate subsampling approach in practice. Furthermore, practically implementable algorithms are proposed based on the optimal structural results, which are evaluated through both theoretical and empirical analyses.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源