概率和非概率样本：通过使用来自不同来源的数据改善回归建模

论文标题

概率和非概率样本：通过使用来自不同来源的数据改善回归建模

Probability and Non-Probability Samples: Improving Regression Modeling by Using Data from Different Sources

论文作者

Tutz, Gerhard

论文摘要

例如，以在线面板的形式出现非概率抽样已成为收集数据的快速而廉价的方法。尽管可靠的推理工具可用于经典概率样本，但由于选择机制通常未知，因此非概率样本可以产生强有偏见的估计值。我们提出了一种通用方法，即当还可以使用来自其他来源的概率样本数据（必须被视为非概率样本）时，如何改善统计推断。该方法使用专门定制的回归残差来扩大原始数据集，包括来自其他来源的观察结果，这些数据可以被视为来自目标人群。估计的准确性衡量标准是通过适应的自举技术获得的。已经证明该方法可以改善各种情况下的估计值。出于说明目的，将提出的方法应用于两个数据集。

Non-probability sampling, for example in the form of online panels, has become a fast and cheap method to collect data. While reliable inference tools are available for classical probability samples, non-probability samples can yield strongly biased estimates since the selection mechanism is typically unknown. We propose a general method how to improve statistical inference when in addition to a probability sample data from other sources, which have to be considered non-probability samples, are available. The method uses specifically tailored regression residuals to enlarge the original data set by including observations from other sources that can be considered as stemming from the target population. Measures of accuracy of estimates are obtained by adapted bootstrap techniques. It is demonstrated that the method can improve estimates in a wide range of scenarios. For illustrative purposes, the proposed method is applied to two data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题