论文标题
深度学习结肠癌检测的研究有限的数据访问方案
A Study of Deep Learning Colon Cancer Detection in Limited Data Access Scenarios
论文作者
论文摘要
组织病理学幻灯片的数字化导致了几项进步,从简单的数据共享和协作到数字诊断工具的开发。进行分类和检测的深度学习(DL)方法表现出很大的潜力,但通常需要大量的培训数据,这些数据很难收集和注释。对于许多癌症类型,数据的稀缺性为训练DL模型带来了障碍。一种这种情况与检测淋巴结组织中的肿瘤转移有关,其中肿瘤与非肿瘤细胞的低比例使诊断任务变得艰巨且耗时。基于DL的工具可以允许更快的诊断,并有可能提高质量。不幸的是,由于肿瘤细胞的稀疏性,注释这种类型的数据需要病理学家的高度努力。使用幻灯片级图像的弱注释显示出很大的潜力,但也需要访问大量数据。在这项研究中,我们研究了有限数据访问方案的缓解策略。特别是,我们解决了是否可以利用组织之间的相互结构来开发一般技术,其中特定组织中一种癌症的数据可能对其他组织中的其他癌症具有诊断价值。我们的病例用DL模型为淋巴结中转移性结肠癌检测的模型举例说明。这样的模型可以很少甚至没有淋巴结数据训练吗?作为替代数据来源,我们研究了从原发性结肠肿瘤组织中获取的肿瘤细胞,以及2)使用循环基因从不同的器官(乳腺)或转化为目标结构域(结肠)的癌症数据。我们表明,建议的方法可以在没有或很少的淋巴结数据的情况下检测癌症转移,从而为现有的,带注释的组织病理学数据可能推广到其他领域的可能性开放。
Digitization of histopathology slides has led to several advances, from easy data sharing and collaborations to the development of digital diagnostic tools. Deep learning (DL) methods for classification and detection have shown great potential, but often require large amounts of training data that are hard to collect, and annotate. For many cancer types, the scarceness of data creates barriers for training DL models. One such scenario relates to detecting tumor metastasis in lymph node tissue, where the low ratio of tumor to non-tumor cells makes the diagnostic task hard and time-consuming. DL-based tools can allow faster diagnosis, with potentially increased quality. Unfortunately, due to the sparsity of tumor cells, annotating this type of data demands a high level of effort from pathologists. Using weak annotations from slide-level images have shown great potential, but demand access to a substantial amount of data as well. In this study, we investigate mitigation strategies for limited data access scenarios. Particularly, we address whether it is possible to exploit mutual structure between tissues to develop general techniques, wherein data from one type of cancer in a particular tissue could have diagnostic value for other cancers in other tissues. Our case is exemplified by a DL model for metastatic colon cancer detection in lymph nodes. Could such a model be trained with little or even no lymph node data? As alternative data sources, we investigate 1) tumor cells taken from the primary colon tumor tissue, and 2) cancer data from a different organ (breast), either as is or transformed to the target domain (colon) using Cycle-GANs. We show that the suggested approaches make it possible to detect cancer metastasis with no or very little lymph node data, opening up for the possibility that existing, annotated histopathology data could generalize to other domains.