论文标题
线性回归模型的自适应贪婪转向变量选择不完整的数据使用多个插补
Adaptive greedy forward variable selection for linear regression models with incomplete data using multiple imputation
论文作者
论文摘要
在这个大数据时代,可变选择对于稀疏建模至关重要。缺少值在数据中很常见,并且会使变量选择更加复杂。多个插补(MI)的方法导致多个估算的数据集以丢失值,并已广泛应用于各种可变选择过程。但是,在整个MI数据或自举Mi数据上直接执行变量选择可能不值得计算成本。为了快速识别线性回归模型中的活动变量,我们提出了具有MI数据的三个合并规则的自适应嫁接过程。所提出的方法进行迭代进行,该方法从基于完整的情况子集找到活性变量开始,然后以有效变量的数量和可用的观测值扩展工作数据矩阵。一项全面的仿真研究显示了所提出方法的不同方面和计算效率的选择精度。两个现实生活中的例子说明了提出的方法的强度。
Variable selection is crucial for sparse modeling in this age of big data. Missing values are common in data, and make variable selection more complicated. The approach of multiple imputation (MI) results in multiply imputed datasets for missing values, and has been widely applied in various variable selection procedures. However, directly performing variable selection on the whole MI data or bootstrapped MI data may not be worthy in terms of computation cost. To fast identify the active variables in the linear regression model, we propose the adaptive grafting procedure with three pooling rules on MI data. The proposed methods proceed iteratively, which starts from finding the active variables based on the complete case subset and then expand the working data matrix with both the number of active variables and available observations. A comprehensive simulation study shows the selection accuracy in different aspects and computational efficiency of the proposed methods. Two real-life examples illustrate the strength of the proposed methods.