论文标题
数字组织病理学的自我监督对比度学习
Self supervised contrastive learning for digital histopathology
论文作者
论文摘要
无监督的学习一直是机器学习的长期目标,对于医学图像分析尤其重要,在此过程中,学习可以弥补标记的数据集的稀缺性。无监督学习的一个有希望的子类是自我监督的学习,该学习旨在使用原始输入作为学习信号来学习显着特征。在本文中,我们使用一种称为SIMCLR的对比度自我监督的学习方法,该方法在天然现场图像上实现了最新的结果,并通过在没有任何标签的情况下收集和预处理57个组织病理学数据集,将此方法应用于数字组织病理学。我们发现,将多个多器官数据集与不同类型的染色和分辨率结合起来可以提高学到的功能的质量。此外,我们发现使用更多图像进行预处理导致多个下游任务的性能更好。在学习功能之上训练的线性分类器表明,在数字组织病理学数据集上预读的网络的性能优于Imagenet预读的网络,使任务性能平均在F1分数中提高了28%以上。在将较新的对比技术应用于组织病理学数据时,这些发现也可能很有用。验证的Pytorch型号可在https://github.com/ozanciga/self-supersevise-histopathology上公开获得。
Unsupervised learning has been a long-standing goal of machine learning and is especially important for medical image analysis, where the learning can compensate for the scarcity of labeled datasets. A promising subclass of unsupervised learning is self-supervised learning, which aims to learn salient features using the raw input as the learning signal. In this paper, we use a contrastive self-supervised learning method called SimCLR that achieved state-of-the-art results on natural-scene images and apply this method to digital histopathology by collecting and pretraining on 57 histopathology datasets without any labels. We find that combining multiple multi-organ datasets with different types of staining and resolution properties improves the quality of the learned features. Furthermore, we find using more images for pretraining leads to a better performance in multiple downstream tasks. Linear classifiers trained on top of the learned features show that networks pretrained on digital histopathology datasets perform better than ImageNet pretrained networks, boosting task performances by more than 28% in F1 scores on average. These findings may also be useful when applying newer contrastive techniques to histopathology data. Pretrained PyTorch models are made publicly available at https://github.com/ozanciga/self-supervised-histopathology.