论文标题
公平,明智地支出隐私预算
Spending Privacy Budget Fairly and Wisely
论文作者
论文摘要
差异化私有(DP)合成数据生成是一种改善数据访问的实用方法,以鼓励富有成效的伙伴关系。 DP固有的一个问题是,“隐私预算”通常是在数据集中的功能之间平均花费的。这导致了与真实数据的良好统计均衡,但可以低估对合成数据预测质量至关重要的条件概率和边际。此外,在整个数据集中,预测质量的丧失可能是不均匀的,与少数群体相对应的子集可能遭受更高的损失。 在本文中,我们开发了“明智地”分发隐私预算的合奏方法,以最大程度地提高接受DP数据训练的模型的预测准确性,并“公平地”以跨越群体的准确性差异并减少不平等。我们的方法基于特征重要性的见解可以告知隐私预算的分配方式,此外,可以将每组特征重要性和与公平性相关的绩效目标纳入分配中。这些见解使我们的方法可调节到社会环境,从而使数据所有者能够生成平衡的合成数据以进行预测分析。
Differentially private (DP) synthetic data generation is a practical method for improving access to data as a means to encourage productive partnerships. One issue inherent to DP is that the "privacy budget" is generally "spent" evenly across features in the data set. This leads to good statistical parity with the real data, but can undervalue the conditional probabilities and marginals that are critical for predictive quality of synthetic data. Further, loss of predictive quality may be non-uniform across the data set, with subsets that correspond to minority groups potentially suffering a higher loss. In this paper, we develop ensemble methods that distribute the privacy budget "wisely" to maximize predictive accuracy of models trained on DP data, and "fairly" to bound potential disparities in accuracy across groups and reduce inequality. Our methods are based on the insights that feature importance can inform how privacy budget is allocated, and, further, that per-group feature importance and fairness-related performance objectives can be incorporated in the allocation. These insights make our methods tunable to social contexts, allowing data owners to produce balanced synthetic data for predictive analysis.