论文标题

使用大数据几乎最佳的捕获征用抽样和经验可能性的加权估计

Nearly optimal capture-recapture sampling and empirical likelihood weighting estimation for M-estimation with big data

论文作者

Fan, Yan, Liu, Yang, Liu, Yukun, Qin, Jing

论文摘要

亚采样技术可以降低处理大数据的计算成本。实际的子采样计划通常涉及初始均匀抽样和精制抽样。使用子样本,大数据推断通常建立在逆概率加权(IPW)上,当概率权重接近零并且无法合并辅助信息时,该分类的概率加权(IPW)变得不稳定。首先,我们考虑捕获征收抽样,该采样将初始均匀采样与第二个泊松采样结合在一起。在此抽样计划下,我们提出了M估计参数的经验可能性权重(ELW)估计方法。其次,基于ELW方法,我们构建了一个几乎最佳的捕获回合抽样计划,以平衡估计效率和计算成本。第三,我们得出了确定最小的样本量的方法,该样本量使用提出的采样和估计方法产生保证精度的估计值。我们的ELW方法通过规避反向概率的使用来克服IPW的不稳定性,并利用辅助信息,包括大小的大小和某些样本矩。我们表明,所提出的ELW方法比IPW产生更有效的估计器,从而导致更有效的最佳采样计划和更经济的样本量,以预先指定的估计精度。这些优点是通过仿真研究和实际数据分析确认的。

Subsampling techniques can reduce the computational costs of processing big data. Practical subsampling plans typically involve initial uniform sampling and refined sampling. With a subsample, big data inferences are generally built on the inverse probability weighting (IPW), which becomes unstable when the probability weights are close to zero and cannot incorporate auxiliary information. First, we consider capture-recapture sampling, which combines an initial uniform sampling with a second Poisson sampling. Under this sampling plan, we propose an empirical likelihood weighting (ELW) estimation approach to an M-estimation parameter. Second, based on the ELW method, we construct a nearly optimal capture-recapture sampling plan that balances estimation efficiency and computational costs. Third, we derive methods for determining the smallest sample sizes with which the proposed sampling-and-estimation method produces estimators of guaranteed precision. Our ELW method overcomes the instability of IPW by circumventing the use of inverse probabilities, and utilizes auxiliary information including the size and certain sample moments of big data. We show that the proposed ELW method produces more efficient estimators than IPW, leading to more efficient optimal sampling plans and more economical sample sizes for a prespecified estimation precision. These advantages are confirmed through simulation studies and real data analyses.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源