论文标题
你得到了你付的东西吗?重新思考基于深度学习的注释成本,基于深度学习的计算机辅助检测
Did You Get What You Paid For? Rethinking Annotation Cost of Deep Learning Based Computer Aided Detection in Chest Radiographs
论文作者
论文摘要
由于深网需要大量准确标记的培训数据,因此收集足够大而准确的注释的策略与识别方法中的创新一样重要。对于为胸部X射线构建计算机辅助检测(CAD)系统尤其如此,需要放射科医生的域专业知识来注释X射线图像上异常的存在和位置。但是,缺乏具体证据,可以提供有关分配数据注释的资源的指导,以使所得的CAD系统达到所需的性能。没有这些知识,从业人员通常会依靠收集尽可能多的数据的策略,这是成本效率低下的。在这项工作中,我们调查了数据注释的成本最终如何影响CAD模型性能对额叶视图X射线图像中胸部异常的分类和分割。我们定义了相对于以下三个维度的注释成本:标签的数量,质量和粒度。在整个研究中,我们隔离了每个维度对所得CAD模型性能对检测X射线中10个胸部异常的影响。在具有超过120k的X射线图像的大规模训练数据中,我们发现,与仅接受金标准注释训练的型号相比,相比,具有大量的成本效益注释可提供巨大的价值,并导致竞争性能。我们还发现,将大量具有成本效益的注释与少量昂贵的标签相结合,以低得多的成本导致竞争性CAD模型。
As deep networks require large amounts of accurately labeled training data, a strategy to collect sufficiently large and accurate annotations is as important as innovations in recognition methods. This is especially true for building Computer Aided Detection (CAD) systems for chest X-rays where domain expertise of radiologists is required to annotate the presence and location of abnormalities on X-ray images. However, there lacks concrete evidence that provides guidance on how much resource to allocate for data annotation such that the resulting CAD system reaches desired performance. Without this knowledge, practitioners often fall back to the strategy of collecting as much detail as possible on as much data as possible which is cost inefficient. In this work, we investigate how the cost of data annotation ultimately impacts the CAD model performance on classification and segmentation of chest abnormalities in frontal-view X-ray images. We define the cost of annotation with respect to the following three dimensions: quantity, quality and granularity of labels. Throughout this study, we isolate the impact of each dimension on the resulting CAD model performance on detecting 10 chest abnormalities in X-rays. On a large scale training data with over 120K X-ray images with gold-standard annotations, we find that cost-efficient annotations provide great value when collected in large amounts and lead to competitive performance when compared to models trained with only gold-standard annotations. We also find that combining large amounts of cost efficient annotations with only small amounts of expensive labels leads to competitive CAD models at a much lower cost.