论文标题
来自1亿个生物医学文档的知识综合增强了冠状病毒受体的深度表达分析
Knowledge synthesis from 100 million biomedical documents augments the deep expression profiling of coronavirus receptors
论文作者
论文摘要
COVID-19的大流行需要吸收所有可用的生物医学知识,以解码其致病性和传播机制。尽管最近无监督的神经网络复兴,用于解码非结构化的天然语言,这是实时综合实时综合生物医学文献及其具有深层洞察见解的全面三角测量的平台。在这里,我们介绍了从非结构化生物医学文本中提取的45000次可能的概念关联的NFERX平台,以及它们与来自25个组织的单细胞RNA测试的三角剖分。使用此平台,我们确定了Covid-19的病理表现与SARS-COV-2受体ACE2的全面表达曲线之间的相互作用。我们发现,舌角质形成细胞和嗅觉上皮细胞可能是SARS-COV-2感染的被低估的靶标,与报告的味觉和气味丧失相关,作为Covid-19感染的早期指标,包括其他无症状患者。气道俱乐部细胞,纤毛细胞和肺中的II型肺细胞,肠道细胞也表达ACE2。这项研究表明,整体数据科学平台如何利用前所未有的结构化和非结构化的公开数据来加快影响力的生物学见解和假设的产生。
The COVID-19 pandemic demands assimilation of all available biomedical knowledge to decode its mechanisms of pathogenicity and transmission. Despite the recent renaissance in unsupervised neural networks for decoding unstructured natural languages, a platform for the real-time synthesis of the exponentially growing biomedical literature and its comprehensive triangulation with deep omic insights is not available. Here, we present the nferX platform for dynamic inference from over 45 quadrillion possible conceptual associations extracted from unstructured biomedical text, and their triangulation with Single Cell RNA-sequencing based insights from over 25 tissues. Using this platform, we identify intersections between the pathologic manifestations of COVID-19 and the comprehensive expression profile of the SARS-CoV-2 receptor ACE2. We find that tongue keratinocytes and olfactory epithelial cells are likely under-appreciated targets of SARS-CoV-2 infection, correlating with reported loss of sense of taste and smell as early indicators of COVID-19 infection, including in otherwise asymptomatic patients. Airway club cells, ciliated cells and type II pneumocytes in the lung, and enterocytes of the gut also express ACE2. This study demonstrates how a holistic data science platform can leverage unprecedented quantities of structured and unstructured publicly available data to accelerate the generation of impactful biological insights and hypotheses.