论文标题
结合观测和随机数据以估计异质治疗效果
Combining Observational and Randomized Data for Estimating Heterogeneous Treatment Effects
论文作者
论文摘要
估计异质治疗效果是许多领域的重要问题。为了准确估计这种治疗效果,通常依靠观察性研究或随机实验的数据。当前,大多数现有的作品仅依赖于观察数据,这些观察数据通常会混淆,因此产生了有偏见的估计。尽管观察数据是混淆的,但随机数据毫无根据,但其样本量通常太小,无法学习异质治疗效果。在本文中,我们建议通过结合大量的观察数据和少量随机数据来估计异质治疗效果。特别是,我们引入了一个两步的框架:首先,我们使用观察数据来学习共享结构(以表示形式);然后,我们使用随机数据来学习特定于数据的结构。我们分析了框架的有限样品特性,并将其与几种天然基线进行比较。因此,我们在组合观察性数据和随机数据时得出条件是有益的,而不是在没有的情况下。基于此,我们引入了一种称为Cornet的样品效率算法。我们使用广泛的仿真研究来验证Cornet和多个现实世界数据集的理论特性,以证明与现有方法相比,我们的方法的优势。
Estimating heterogeneous treatment effects is an important problem across many domains. In order to accurately estimate such treatment effects, one typically relies on data from observational studies or randomized experiments. Currently, most existing works rely exclusively on observational data, which is often confounded and, hence, yields biased estimates. While observational data is confounded, randomized data is unconfounded, but its sample size is usually too small to learn heterogeneous treatment effects. In this paper, we propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data via representation learning. In particular, we introduce a two-step framework: first, we use observational data to learn a shared structure (in form of a representation); and then, we use randomized data to learn the data-specific structures. We analyze the finite sample properties of our framework and compare them to several natural baselines. As such, we derive conditions for when combining observational and randomized data is beneficial, and for when it is not. Based on this, we introduce a sample-efficient algorithm, called CorNet. We use extensive simulation studies to verify the theoretical properties of CorNet and multiple real-world datasets to demonstrate our method's superiority compared to existing methods.