论文标题
高维数据的联合奇异值分解
Federated singular value decomposition for high dimensional data
论文作者
论文摘要
联合学习(FL)正在成为基于古典云的机器学习的隐私意识替代品。在FL中,敏感数据保留在数据孤岛中,仅交换了聚合参数。不愿意共享数据的医院和研究机构可以加入联邦研究,而无需违反机密性。除了生物医学数据的极端敏感性外,高维度在联邦全基因组关联研究(GWAS)的背景下也带来了挑战。在本文中,我们提出了一个联合奇异值分解(SVD)算法,适用于GWAS的隐私相关和计算要求。值得注意的是,该算法的传输成本与样品数量无关,并且仅取决于特征的数量,因为与样品相关的单数向量从未交换,并且仅与固定数量的迭代数量相关的向量。尽管由GWAS激发,但该算法通常适用于水平和垂直分区的数据。
Federated learning (FL) is emerging as a privacy-aware alternative to classical cloud-based machine learning. In FL, the sensitive data remains in data silos and only aggregated parameters are exchanged. Hospitals and research institutions which are not willing to share their data can join a federated study without breaching confidentiality. In addition to the extreme sensitivity of biomedical data, the high dimensionality poses a challenge in the context of federated genome-wide association studies (GWAS). In this article, we present a federated singular value decomposition (SVD) algorithm, suitable for the privacy-related and computational requirements of GWAS. Notably, the algorithm has a transmission cost independent of the number of samples and is only weakly dependent on the number of features, because the singular vectors associated with the samples are never exchanged and the vectors associated with the features only for a fixed number of iterations. Although motivated by GWAS, the algorithm is generically applicable for both horizontally and vertically partitioned data.