论文标题
基于抽样的贝叶斯推论中的知识去除
Knowledge Removal in Sampling-based Bayesian Inference
论文作者
论文摘要
在许多国家中,被遗忘的权利已被立法,但其在人工智能行业中的执行将造成难以忍受的成本。当单个数据删除请求到来时,公司可能需要删除用大量资源学习的整个模型。现有作品提出的方法是从数据中删除知识的方法,用于明确的参数化模型,但是这些模型不适用于基于采样的贝叶斯推断,即马尔可夫链蒙特卡洛(MCMC),因为MCMC只能推断隐式分布。在本文中,我们提出了第一台用于MCMC的计算算法。我们首先将MCMC解读问题转换为明确的优化问题。基于此问题转换,{\ it MCMC影响函数}旨在证明从数据中表征学习的知识,然后传递MCMC Unrearning算法。理论分析表明,MCMC的学习不会损害MCMC模型的普遍性。高斯混合模型和贝叶斯神经网络的实验证实了拟议算法的有效性。该代码可在\ url {https://github.com/fshp971/mcmc-unlearning}中获得。
The right to be forgotten has been legislated in many countries, but its enforcement in the AI industry would cause unbearable costs. When single data deletion requests come, companies may need to delete the whole models learned with massive resources. Existing works propose methods to remove knowledge learned from data for explicitly parameterized models, which however are not appliable to the sampling-based Bayesian inference, i.e., Markov chain Monte Carlo (MCMC), as MCMC can only infer implicit distributions. In this paper, we propose the first machine unlearning algorithm for MCMC. We first convert the MCMC unlearning problem into an explicit optimization problem. Based on this problem conversion, an {\it MCMC influence function} is designed to provably characterize the learned knowledge from data, which then delivers the MCMC unlearning algorithm. Theoretical analysis shows that MCMC unlearning would not compromise the generalizability of the MCMC models. Experiments on Gaussian mixture models and Bayesian neural networks confirm the effectiveness of the proposed algorithm. The code is available at \url{https://github.com/fshp971/mcmc-unlearning}.