动态不完整的多视图数据的增量无监督功能选择

论文标题

动态不完整的多视图数据的增量无监督功能选择

Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data

论文作者

Huang, Yanyong, Guo, Kejun, Yi, Xiuwen, Li, Zhong, Li, Tianrui

论文摘要

事实证明，多视图无监督的特征选择在降低具有高维度的多视图未标记数据的维度方面是有效的。先前的方法假设所有视图都已完成。但是，在实际应用程序中，多视图数据通常不完整，即缺少某些实例的视图，这将导致这些方法的失败。此外，尽管数据以流的形式到达时，这些现有方法将遭受高存储成本和昂贵计算时间的问题。为了解决这些问题，我们在不完整的多视图流数据上提出了一个不完整的多视图无监督功能选择方法（I $^2 $ MUFS）。通过共同考虑不同视图的一致和互补信息，I $^2 $ MUFS将无监督的特征选择嵌入了扩展的加权非负矩阵分解模型中，该模型可以学习共识群集指示器矩阵，并融合不同的潜在特征矩阵，并具有适应性的视图。此外，我们介绍了增量倾斜机制来开发一种替代性迭代算法，其中功能选择矩阵已逐步更新，而不是从scratch中重新计算整个更新的数据。进行了一系列实验，以通过与几种最新方法进行比较来验证该方法的有效性。实验结果证明了根据聚类指标和计算成本的有效性和效率。

Multi-view unsupervised feature selection has been proven to be efficient in reducing the dimensionality of multi-view unlabeled data with high dimensions. The previous methods assume all of the views are complete. However, in real applications, the multi-view data are often incomplete, i.e., some views of instances are missing, which will result in the failure of these methods. Besides, while the data arrive in form of streams, these existing methods will suffer the issues of high storage cost and expensive computation time. To address these issues, we propose an Incremental Incomplete Multi-view Unsupervised Feature Selection method (I$^2$MUFS) on incomplete multi-view streaming data. By jointly considering the consistent and complementary information across different views, I$^2$MUFS embeds the unsupervised feature selection into an extended weighted non-negative matrix factorization model, which can learn a consensus clustering indicator matrix and fuse different latent feature matrices with adaptive view weights. Furthermore, we introduce the incremental leaning mechanisms to develop an alternative iterative algorithm, where the feature selection matrix is incrementally updated, rather than recomputing on the entire updated data from scratch. A series of experiments are conducted to verify the effectiveness of the proposed method by comparing with several state-of-the-art methods. The experimental results demonstrate the effectiveness and efficiency of the proposed method in terms of the clustering metrics and the computational cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题