论文标题
公用事业感知隐私的数据发布
Utility-aware Privacy-preserving Data Releasing
论文作者
论文摘要
在大数据时代,开发了越来越多的基于云的数据驱动的应用程序,以利用单个数据提供某些有价值的服务(公用事业)。另一方面,由于可以利用相同的一组单个数据来推断个人的某些敏感信息,因此它创建了新的渠道来夺取个人的隐私。因此,开发能够使数据所有者发布私有数据的技术非常重要,这些数据仍然可以用于某些预期目的。但是,现有的数据发布方法要么是隐私的(不考虑公用事业)或公用事业驱动的(不保证隐私)。在这项工作中,我们提出了一个基于两步的基于扰动的公用事业感知的隐私数据释放框架。首先,从公共领域数据(背景知识)中学到了某些预定义的隐私和效用问题。稍后,我们的方法利用学习的知识准确地将数据所有者数据的数据置于私有化数据中,这些数据可以成功地用于某些预期目的(学习成功),而不会危害某些预定义的隐私(培训失败)。已经进行了有关人类活动认可,人口普查收入和银行营销数据集的广泛实验,以证明我们框架的有效性和实用性。
In the big data era, more and more cloud-based data-driven applications are developed that leverage individual data to provide certain valuable services (the utilities). On the other hand, since the same set of individual data could be utilized to infer the individual's certain sensitive information, it creates new channels to snoop the individual's privacy. Hence it is of great importance to develop techniques that enable the data owners to release privatized data, that can still be utilized for certain premised intended purpose. Existing data releasing approaches, however, are either privacy-emphasized (no consideration on utility) or utility-driven (no guarantees on privacy). In this work, we propose a two-step perturbation-based utility-aware privacy-preserving data releasing framework. First, certain predefined privacy and utility problems are learned from the public domain data (background knowledge). Later, our approach leverages the learned knowledge to precisely perturb the data owners' data into privatized data that can be successfully utilized for certain intended purpose (learning to succeed), without jeopardizing certain predefined privacy (training to fail). Extensive experiments have been conducted on Human Activity Recognition, Census Income and Bank Marketing datasets to demonstrate the effectiveness and practicality of our framework.