论文标题

Mycorrhiza:使用晶体发育网络的基因型分配

Mycorrhiza: Genotype Assignment usingPhylogenetic Networks

论文作者

Georges-Filteau, Jeremy, Hamelin, Richard C., Blanchette, Mathieu

论文摘要

动机基因型分配问题包括从个人的基因型中预测其起源于哪一组人群的基因型。这个问题是在各种情况下出现的,包括野生动植物取证,入侵物种检测和生物多样性监测。现有方法在理想的条件下表现良好,但对他们依赖的假设的各种普遍违规敏感。结果在本文中,我们介绍了Mycorrhiza,这是一种用于基因型分配问题的机器学习方法。我们的算法利用系统发育网络来设计编码样品之间进化关系的特征。然后将这些特征用作随机森林分类器的输入。分类精度在多个已发表的经验SNP,微卫星或共识序列数据集上进行了评估,其大小,地理分布和种群结构以及模拟数据集的范围很大。它与广泛使用的评估测试或混合分析方法(例如结构和混合物)进行了比较,并使用主组件分析来降低维度,并将其与另一种基于机器学习的方法进行了比较。菌根在具有较大平均固定指数(FST)或与Hardy-Weinberg均衡的数据集上产生特别显着的收益。此外,系统发育网络方法以良好的精度估算混合物比例。

Motivation The genotype assignment problem consists of predicting, from the genotype of an individual, which of a known set of populations it originated from. The problem arises in a variety of contexts, including wildlife forensics, invasive species detection and biodiversity monitoring. Existing approaches perform well under ideal conditions but are sensitive to a variety of common violations of the assumptions they rely on. Results In this article, we introduce Mycorrhiza, a machine learning approach for the genotype assignment problem. Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples. Those features are then used as input to a Random Forests classifier. The classification accuracy was assessed on multiple published empirical SNP, microsatellite or consensus sequence datasets with wide ranges of size, geographical distribution and population structure and on simulated datasets. It compared favorably against widely used assessment tests or mixture analysis methods such as STRUCTURE and Admixture, and against another machine-learning based approach using principal component analysis for dimensionality reduction. Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium. Moreover, the phylogenetic network approach estimates mixture proportions with good accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源