论文标题
匿名数据以保护隐私的联合学习
Anonymizing Data for Privacy-Preserving Federated Learning
论文作者
论文摘要
联合学习使从分布在多个站点的数据中培训全球机器学习模型,而无需移动数据。这在医疗保健应用中尤其重要,在医疗保健应用程序中,数据充斥着个人,高度敏感的信息,并且数据分析方法必须符合监管指南。尽管联合学习阻止了共享原始数据,但仍有可能对训练过程中暴露的模型参数或生成的机器学习模型发起隐私攻击。在本文中,我们提出了在联合学习的背景下提供隐私的第一种句法方法。与最先进的基于差异的框架不同,我们的方法旨在最大程度地提高实用程序或模型性能,同时按照GDPR和HIPAA的要求支持可辩护的隐私水平。我们使用100万患者的实际电子健康数据对医疗领域的两个重要问题进行了全面的经验评估。结果证明了我们方法在实现高模型性能方面的有效性,同时提供了所需的隐私水平。通过比较研究,我们还表明,对于不同的数据集,实验设置和隐私预算,我们的方法比联合学习中的基于差异的隐私技术提供了更高的模型性能。
Federated learning enables training a global machine learning model from data distributed across multiple sites, without having to move the data. This is particularly relevant in healthcare applications, where data is rife with personal, highly-sensitive information, and data analysis methods must provably comply with regulatory guidelines. Although federated learning prevents sharing raw data, it is still possible to launch privacy attacks on the model parameters that are exposed during the training process, or on the generated machine learning model. In this paper, we propose the first syntactic approach for offering privacy in the context of federated learning. Unlike the state-of-the-art differential privacy-based frameworks, our approach aims to maximize utility or model performance, while supporting a defensible level of privacy, as demanded by GDPR and HIPAA. We perform a comprehensive empirical evaluation on two important problems in the healthcare domain, using real-world electronic health data of 1 million patients. The results demonstrate the effectiveness of our approach in achieving high model performance, while offering the desired level of privacy. Through comparative studies, we also show that, for varying datasets, experimental setups, and privacy budgets, our approach offers higher model performance than differential privacy-based techniques in federated learning.