论文标题
半参数二进制响应模型的分布式估计和推断
Distributed Estimation and Inference for Semi-parametric Binary Response Models
论文作者
论文摘要
现代技术的开发使史无前例的数据收集了数据,这对许多统计估计和推理问题构成了新的挑战。本文研究了分布式计算环境下半参数二进制选择模型的最大分数估计器,而无需预先指定噪声分布。直观的划分估计量在计算上是昂贵的,并且由于目标函数的高度不平滑的性质,因此受到机器数量的非规范限制的限制。我们提出(1)在平滑目标以放松约束后的单发截面估计器后,以及(2)多轮估计器,通过迭代平滑完全消除约束。我们用依次缩小的带宽指定了内核更顺滑的自适应选择,以实现高级优化误差在多个迭代中的改进。得出了每次迭代的提高统计准确性,并建立了达到最佳统计错误率的二次收敛。我们进一步提供了两种概括来处理数据集的异质性以及感兴趣参数很少的高维问题。
The development of modern technology has enabled data collection of unprecedented size, which poses new challenges to many statistical estimation and inference problems. This paper studies the maximum score estimator of a semi-parametric binary choice model under a distributed computing environment without pre-specifying the noise distribution. An intuitive divide-and-conquer estimator is computationally expensive and restricted by a non-regular constraint on the number of machines, due to the highly non-smooth nature of the objective function. We propose (1) a one-shot divide-and-conquer estimator after smoothing the objective to relax the constraint, and (2) a multi-round estimator to completely remove the constraint via iterative smoothing. We specify an adaptive choice of kernel smoother with a sequentially shrinking bandwidth to achieve the superlinear improvement of the optimization error over the multiple iterations. The improved statistical accuracy per iteration is derived, and a quadratic convergence up to the optimal statistical error rate is established. We further provide two generalizations to handle the heterogeneity of datasets and high-dimensional problems where the parameter of interest is sparse.