论文标题

使用贝叶斯数据合成对青年风险行为的隐私保护:对YRB的案例研究

Privacy Protection for Youth Risk Behavior Using Bayesian Data Synthesis: A Case Study to the YRBS

论文作者

Cao, Yixiao, Hu, Jingchen

论文摘要

各种各样的公共可用调查数据集,尽管有用,但仍引起了受访者级别的隐私问题。在隐私保护和公用事业保存方面,已显示出合成数据隐私和机密性的合成数据方法。本文旨在说明合成数据如何通过介绍对青年风险行为调查样本(YRB)的合成数据案例研究来促进有关青年风险行为的高度敏感信息。鉴于YRB中几乎所有变量的分类性质,采用了多项式(DPMPM)合成器的Dirichlet过程混合物,以部分合成YRBS样本。对公用事业和披露风险的详细评估表明,与机密的YRSB样本相比,生成的合成数据能够显着降低披露风险,同时保持高效用水平。

The large number of publicly available survey datasets of wide variety, albeit useful, raise respondent-level privacy concerns. The synthetic data approach to data privacy and confidentiality has been shown useful in terms of privacy protection and utility preservation. This paper aims at illustrating how synthetic data can facilitate the dissemination of highly sensitive information about youth risk behavior by presenting a case study of synthetic data for a sample of the Youth Risk Behavior Survey (YRBS). Given the categorical nature of almost all variables in YRBS, the Dirichlet Process mixture of products of multinomials (DPMPM) synthesizer is adopted to partially synthesize the YRBS sample. Detailed evaluations of utility and disclosure risks demonstrate that the generated synthetic data are able to significantly reduce the disclosure risks compared to the confidential YRSB sample while maintaining a high level of utility.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源