在差异私人数据发布中可视化隐私 - 实用性权衡

论文标题

在差异私人数据发布中可视化隐私 - 实用性权衡

Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Releases

论文作者

Nanayakkara, Priyanka, Bater, Johes, He, Xi, Hullman, Jessica, Rogers, Jennie

论文摘要

组织经常收集私人数据并释放汇总统计数据以供公众的利益。如果没有采取任何保留隐私的步骤，则对手可以使用已发布的统计信息来推断有关私人数据集中描述的个人的未经授权信息。私有算法通过噪声略微扰动潜在的统计信息来应对这一挑战，从而数学上限制了每个数据发布可以推论的信息量。正确校准这些算法 - 又，数据集中描述的人的披露风险 - 要求数据策展人为隐私预算参数（$ε$）选择一个值。但是，几乎没有选择$ε$的正式指导，这项任务需要关于概率的隐私 - 实用性权衡。此外，在统计推断的背景下，选择$ε$需要在存在测量误差和差异隐私（DP）噪声的情况下进行准确折衷的推理。我们提供可视化隐私（VIP），这是一种交互式界面，可视化$ε$，准确性和披露风险之间的关系，以支持设置和在查询中分配$ε$。当用户调整$ε$时，VIP动态更新可视化效应，描述了预期准确性和风险。 VIP还具有推理设置，允许用户推理DP噪声对统计推断的影响。最后，我们介绍了一项研究的结果，其中16位具有DP背景的研究从业人员使用VIP和A Control完成了与设置$ε$有关的一组任务。我们发现，VIP可以帮助参与者更正确地回答与判断DP-名称释放可能会下降的可能性相关的问题，并在DP-致命和非私人置信区间之间进行比较。

Organizations often collect private data and release aggregate statistics for the public's benefit. If no steps toward preserving privacy are taken, adversaries may use released statistics to deduce unauthorized information about the individuals described in the private dataset. Differentially private algorithms address this challenge by slightly perturbing underlying statistics with noise, thereby mathematically limiting the amount of information that may be deduced from each data release. Properly calibrating these algorithms -- and in turn the disclosure risk for people described in the dataset -- requires a data curator to choose a value for a privacy budget parameter, $ε$. However, there is little formal guidance for choosing $ε$, a task that requires reasoning about the probabilistic privacy-utility trade-off. Furthermore, choosing $ε$ in the context of statistical inference requires reasoning about accuracy trade-offs in the presence of both measurement error and differential privacy (DP) noise. We present Visualizing Privacy (ViP), an interactive interface that visualizes relationships between $ε$, accuracy, and disclosure risk to support setting and splitting $ε$ among queries. As a user adjusts $ε$, ViP dynamically updates visualizations depicting expected accuracy and risk. ViP also has an inference setting, allowing a user to reason about the impact of DP noise on statistical inferences. Finally, we present results of a study where 16 research practitioners with little to no DP background completed a set of tasks related to setting $ε$ using both ViP and a control. We find that ViP helps participants more correctly answer questions related to judging the probability of where a DP-noised release is likely to fall and comparing between DP-noised and non-private confidence intervals.

下载PDF全文

下载文献需遵守相关版权规定

论文标题