论文标题

PPAA:隐私保护作为服务

PPaaS: Privacy Preservation as a Service

论文作者

Arachchige, Pathum Chamikara Mahawaga, Bertok, Peter, Khalil, Ibrahim, Liu, Dongxi, Camtepe, Seyit

论文摘要

个人身份信息(PII)可以通过各种渠道找到进入网络空间的方式,许多潜在的来源可以泄漏此类信息。机器学习和分析的数据共享(例如跨机构数据共享)是数据科学的重要组成部分之一。但是,由于隐私问题,应在共享之前使用强大的隐私保证来执行数据。开发了不同的隐私方法来保存数据共享;但是,确定特定数据集隐私保护的最佳隐私保护方法仍然是一个挑战。不同的参数可以影响该过程的疗效,例如输入数据集的特征,隐私保护方法的强度以及所得数据集的预期效用(在相应的数据挖掘应用程序(例如分类)上)。本文介绍了一个名为\ USWINELLINE {P} Rivacy \ Unessline {p}保留\下划线{a} s \ supsline {a} \ usevice {s} ervice(ppaas)以降低这种复杂性的框架。所提出的方法通过数据扰动采用选择性隐私保护,并研究可能影响数据集隐私保护质量的不同动态。 PPAA包括数据扰动方法的池,对于每个应用程序和输入数据集,PPAAS在严格评估后选择了最合适的数据扰动方法。它增强了其池中隐私方法的可用性;它是一个通用平台,可通过采用适当的多种隐私性算法组合来以精细的,特定于应用的方式来消毒大数据,以在隐私和实用程序之间提供适当的平衡。

Personally identifiable information (PII) can find its way into cyberspace through various channels, and many potential sources can leak such information. Data sharing (e.g. cross-agency data sharing) for machine learning and analytics is one of the important components in data science. However, due to privacy concerns, data should be enforced with strong privacy guarantees before sharing. Different privacy-preserving approaches were developed for privacy preserving data sharing; however, identifying the best privacy-preservation approach for the privacy-preservation of a certain dataset is still a challenge. Different parameters can influence the efficacy of the process, such as the characteristics of the input dataset, the strength of the privacy-preservation approach, and the expected level of utility of the resulting dataset (on the corresponding data mining application such as classification). This paper presents a framework named \underline{P}rivacy \underline{P}reservation \underline{a}s \underline{a} \underline{S}ervice (PPaaS) to reduce this complexity. The proposed method employs selective privacy preservation via data perturbation and looks at different dynamics that can influence the quality of the privacy preservation of a dataset. PPaaS includes pools of data perturbation methods, and for each application and the input dataset, PPaaS selects the most suitable data perturbation approach after rigorous evaluation. It enhances the usability of privacy-preserving methods within its pool; it is a generic platform that can be used to sanitize big data in a granular, application-specific manner by employing a suitable combination of diverse privacy-preserving algorithms to provide a proper balance between privacy and utility.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源