论文标题
对乳房X线摄影的最新深层网络的独立评估
Independent evaluation of state-of-the-art deep networks for mammography
论文作者
论文摘要
每当有标记的图像的大量数据集可用时,深度神经模型在图像识别任务中表现出色。放射学上最大的数据集可用于筛查乳房X线摄影。最近的报告,包括在高影响期刊中,记录了受过训练的放射科医生或以上的深层模型的性能。尚不清楚的是这些训练有素的模型的性能是否强大,并且在数据集中重复。在这里,我们评估了四个公开乳房X线摄影数据集上五个已发表的最先进模型的性能。公共数据集的大小有限无法重新训练该模型,因此我们只能评估那些已通过预训练参数提供的模型。在可用测试数据的地方,我们复制了已发布的结果。但是,训练有素的模型在样本外数据上的表现较差,除非基于乳房X线检查检查的所有四个标准视图。我们得出的结论是,未来的进步将取决于一致的努力,以使公开可用的多样化和更大的乳房X线摄影数据集。同时,应谨慎判断不伴随培训模型的培训模型的结果。
Deep neural models have shown remarkable performance in image recognition tasks, whenever large datasets of labeled images are available. The largest datasets in radiology are available for screening mammography. Recent reports, including in high impact journals, document performance of deep models at or above that of trained radiologists. What is not yet known is whether performance of these trained models is robust and replicates across datasets. Here we evaluate performance of five published state-of-the-art models on four publicly available mammography datasets. The limited size of public datasets precludes retraining the model and so we are limited to evaluate those models that have been made available with pre-trained parameters. Where test data was available, we replicated published results. However, the trained models performed poorly on out-of-sample data, except when based on all four standard views of a mammographic exam. We conclude that future progress will depend on a concerted effort to make more diverse and larger mammography datasets publicly available. Meanwhile, results that are not accompanied by a release of trained models for independent validation should be judged cautiously.