使用多标签分类的分布式高斯流程中的依赖专家选择

论文标题

使用多标签分类的分布式高斯流程中的依赖专家选择

Entry Dependent Expert Selection in Distributed Gaussian Processes Using Multilabel Classification

论文作者

Jalali, Hamed, Kasneci, Gjergji

论文摘要

通过分发训练过程，本地近似降低了标准高斯过程的成本。合奏技术结合了对数据不同分区培训的高斯专家的本地预测。集合方法通过假设局部预测变量的完美多样性来汇总模型的预测。尽管它可以使聚合可以处理，但在实践中通常会违反此假设。即使合奏方法通过假设专家之间的依赖性提供了一致的结果，但它们具有很高的计算成本，这在涉及的专家数量中是立方体。通过实施专家选择策略，最终的聚合步骤使用的专家更少，并且更有效。但是，将固定专家组为每个新数据点的选择方法不能编码每个唯一数据点的特定属性。本文根据进入数据点的特征提出了一种灵活的专家选择方法。为此，我们研究了选择任务是一个多标签分类问题，专家定义标签，每个入口点都分配给一些专家。详细讨论了拟议的解决方案的预测质量，效率和渐近性能。我们通过使用合成和现实世界数据集的广泛数值实验来证明方法的功效。

By distributing the training process, local approximation reduces the cost of the standard Gaussian Process. An ensemble technique combines local predictions from Gaussian experts trained on different partitions of the data. Ensemble methods aggregate models' predictions by assuming a perfect diversity of local predictors. Although it keeps the aggregation tractable, this assumption is often violated in practice. Even though ensemble methods provide consistent results by assuming dependencies between experts, they have a high computational cost, which is cubic in the number of experts involved. By implementing an expert selection strategy, the final aggregation step uses fewer experts and is more efficient. However, a selection approach that assigns a fixed set of experts to each new data point cannot encode the specific properties of each unique data point. This paper proposes a flexible expert selection approach based on the characteristics of entry data points. To this end, we investigate the selection task as a multi-label classification problem where the experts define labels, and each entry point is assigned to some experts. The proposed solution's prediction quality, efficiency, and asymptotic properties are discussed in detail. We demonstrate the efficacy of our method through extensive numerical experiments using synthetic and real-world data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题