论文标题
推荐系统的部分合成数据:预测性能和偏好隐藏
Partially Synthetic Data for Recommender Systems: Prediction Performance and Preference Hiding
论文作者
论文摘要
本文展示了统计披露控制的潜力,即保护用于训练推荐系统的数据。具体来说,我们使用合成数据生成方法将特定信息隐藏在用户项目矩阵中。我们将转换应用于改变一些值的原始数据,但使其他值相同。结果是一个部分合成数据集,可用于建议,但包含有关单个用户偏好的特定信息。综合数据有可能对释放数据有兴趣的公司有用,以允许外部各方开发新的推荐算法,即,在推荐系统挑战的情况下,还降低了与数据盗用相关的风险。我们的实验在我们的部分合成数据集以及原始数据上运行了一组推荐系统算法。结果表明,算法在部分合成数据上的相对性能反映了原始数据的相对性能。进一步的分析表明,原始数据的属性保留在综合下,但是对于原始数据中可访问的某些属性示例,综合数据隐藏了。
This paper demonstrates the potential of statistical disclosure control for protecting the data used to train recommender systems. Specifically, we use a synthetic data generation approach to hide specific information in the user-item matrix. We apply a transformation to the original data that changes some values, but leaves others the same. The result is a partially synthetic data set that can be used for recommendation but contains less specific information about individual user preferences. Synthetic data has the potential to be useful for companies, who are interested in releasing data to allow outside parties to develop new recommender algorithms, i.e., in the case of a recommender system challenge, and also reducing the risks associated with data misappropriation. Our experiments run a set of recommender system algorithms on our partially synthetic data sets as well as on the original data. The results show that the relative performance of the algorithms on the partially synthetic data reflects the relative performance on the original data. Further analysis demonstrates that properties of the original data are preserved under synthesis, but that for certain examples of attributes accessible in the original data are hidden in the synthesized data.