带有无与伦比数据的线性回归：反卷积的透视图

论文标题

带有无与伦比数据的线性回归：反卷积的透视图

Linear regression with unmatched data: a deconvolution perspective

论文作者

Azadkia, Mona, Balabdaoui, Fadoua

论文摘要

考虑回归问题，其中响应$ y \ in \ mathbb {r} $和$ d \ geq 1 $ for $ d \ geq 1 $的协变量$ x \ in \ mathbb {r}^d $是\ textIt {textit {textit {tormutched}。在这种情况下，我们无法从$（x，y）$的分布中访问成对的观察结果，但是，我们拥有单独的数据集$ \ {y_i \} _ {i = 1}^n $和$ \ \ \ {x_j \} _ {x_j \} _ {j = 1}^m $，可能是从不同的来源收集的。假设回归函数是线性的，并且可以估算噪声分布，我们研究了这个问题。我们基于反卷积介绍了回归载体的估计器，并在可识别性假设下证明了其一致性和渐近正态性。在一般情况下，我们表明我们的估计器（DLSE：Deconvolution最小二乘估算器）在扩展的$ \ ell_2 $ norm中是一致的。使用此观察结果，我们设计了一种半监督学习的方法，即当我们可以访问一小部分匹配对$（x_k，y__k）$时。考虑了几种具有合成和实际数据集的应用程序来说明理论。

Consider the regression problem where the response $Y\in\mathbb{R}$ and the covariate $X\in\mathbb{R}^d$ for $d\geq 1$ are \textit{unmatched}. Under this scenario, we do not have access to pairs of observations from the distribution of $(X, Y)$, but instead, we have separate datasets $\{Y_i\}_{i=1}^n$ and $\{X_j\}_{j=1}^m$, possibly collected from different sources. We study this problem assuming that the regression function is linear and the noise distribution is known or can be estimated. We introduce an estimator of the regression vector based on deconvolution and demonstrate its consistency and asymptotic normality under an identifiability assumption. In the general case, we show that our estimator (DLSE: Deconvolution Least Squared Estimator) is consistent in terms of an extended $\ell_2$ norm. Using this observation, we devise a method for semi-supervised learning, i.e., when we have access to a small sample of matched pairs $(X_k, Y_k)$. Several applications with synthetic and real datasets are considered to illustrate the theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题