分散的EM从功能分发的数据集中学习高斯混合物

论文标题

分散的EM从功能分发的数据集中学习高斯混合物

Decentralized EM to Learn Gaussian Mixtures from Datasets Distributed by Features

论文作者

Valdeira, Pedro, Soares, Cláudia, Xavier, João

论文摘要

期望最大化（EM）是学习高斯混合物的标准方法。然而，由于隐私问题以及计算和沟通瓶颈，其经典的集中式形式通常是不可行的。先前的工作涉及由示例分配的数据，水平分区，但我们缺乏针对特征分散的数据的对应，这是一个日益常见的方案（例如，用户分析了来自多个实体的数据）。为了填补这一空白，我们提供了一种基于EM的算法，以将高斯混合物适合垂直分区的数据（VP-EM）。在联合学习设置中，我们的算法与限制在子空间的高斯混合物的集中式EM拟合相匹配。在任意通信图中，共识平均允许VP-EM在大型对等网络上作为EM近似运行。此不匹配仅来自共识错误，这随着共识回合的数量而迅速地消失。我们向合成数据和真实数据的各种拓扑介绍了VP-EM，评估了其集中式EM的近似值，并看到它的表现优于可用基准。

Expectation Maximization (EM) is the standard method to learn Gaussian mixtures. Yet its classic, centralized form is often infeasible, due to privacy concerns and computational and communication bottlenecks. Prior work dealt with data distributed by examples, horizontal partitioning, but we lack a counterpart for data scattered by features, an increasingly common scheme (e.g. user profiling with data from multiple entities). To fill this gap, we provide an EM-based algorithm to fit Gaussian mixtures to Vertically Partitioned data (VP-EM). In federated learning setups, our algorithm matches the centralized EM fitting of Gaussian mixtures constrained to a subspace. In arbitrary communication graphs, consensus averaging allows VP-EM to run on large peer-to-peer networks as an EM approximation. This mismatch comes from consensus error only, which vanishes exponentially fast with the number of consensus rounds. We demonstrate VP-EM on various topologies for both synthetic and real data, evaluating its approximation of centralized EM and seeing that it outperforms the available benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题