论文标题
一种基于建设性的基于GAN的方法,无法进行确切的估计治疗效果,而无需匹配
A Constructive GAN-based Approach to Exact Estimate Treatment Effect without Matching
论文作者
论文摘要
匹配已成为反事实推断中的主流,可以大大消除样品组之间的选择偏差。但是,实际上,当通过匹配估算对治疗(ATT)的平均治疗效果时,无论哪种方法,估计准确性和信息丢失之间的权衡不断存在。本文试图完全替换匹配过程,提出了将生成性对抗网络(GAN)集成到反事实推理框架中的Gan-Att估计器。通过GAN机器学习,可以近似处理组和对照组中样品的概率密度函数(PDF)。通过区分具有相同输入条件的两组的条件PDF,可以估算条件平均治疗效果(CATE),并且所有治疗组样本中相应CATE的集合平均值是ATT的估计值。利用基于GAN的无限样品增强,可以轻松解决样本不足或缺乏共同支持域的问题。从理论上讲,当GAN能够完美地学习PDF时,我们的估计器可以提供准确的ATT估计。 为了检查Gan-Att估计器的性能,使用了三组数据进行ATT估计:两个具有1/2维的协变量输入和常数/协方差依赖性治疗效果的玩具数据集。事实证明,Gan-Att的估计值与传统的匹配方法相比要好。测试了具有高维输入的实际公司级数据集,并通过比较匹配方法来评估对实际数据集的适用性。通过从三个测试中获得的证据,我们认为Gan-Att估计器比传统匹配方法在估计ATT方面具有显着优势。
Matching has become the mainstream in counterfactual inference, with which selection bias between sample groups can be significantly eliminated. However in practice, when estimating average treatment effect on the treated (ATT) via matching, no matter which method, the trade-off between estimation accuracy and information loss constantly exist. Attempting to completely replace the matching process, this paper proposes the GAN-ATT estimator that integrates generative adversarial network (GAN) into counterfactual inference framework. Through GAN machine learning, the probability density functions (PDFs) of samples in both treatment group and control group can be approximated. By differentiating conditional PDFs of the two groups with identical input condition, the conditional average treatment effect (CATE) can be estimated, and the ensemble average of corresponding CATEs over all treatment group samples is the estimate of ATT. Utilizing GAN-based infinite sample augmentations, problems in the case of insufficient samples or lack of common support domains can be easily solved. Theoretically, when GAN could perfectly learn the PDFs, our estimators can provide exact estimate of ATT. To check the performance of the GAN-ATT estimator, three sets of data are used for ATT estimations: Two toy data sets with 1/2 dimensional covariate inputs and constant/covariate-dependent treatment effect are tested. The estimates of GAN-ATT are proved close to the ground truth and are better than traditional matching approaches; A real firm-level data set with high-dimensional input is tested and the applicability towards real data sets is evaluated by comparing matching approaches. Through the evidences obtained from the three tests, we believe that the GAN-ATT estimator has significant advantages over traditional matching methods in estimating ATT.