知识指导的以数据为中心的医疗保健AI：进步，缺点和未来的方向

论文标题

知识指导的以数据为中心的医疗保健AI：进步，缺点和未来的方向

Knowledge-Guided Data-Centric AI in Healthcare: Progress, Shortcomings, and Future Directions

论文作者

Chang, Edward Y.

论文摘要

深度学习的成功很大程度上是由于大量培训数据的可用性，这些数据涵盖了特定概念或含义的广泛示例。在医学领域，拥有有关特定疾病的各种培训数据可以导致能够准确预测该疾病的模型的发展。但是，尽管有潜在的好处，但由于缺乏高质量的注释数据，基于图像的诊断并没有取得重大进展。本文强调了使用以数据为中心的方法提高数据表示质量的重要性，尤其是在可用数据有限的情况下。为了解决这个“小数据”问题，我们讨论了生成和汇总培训数据的四种方法：数据增强，转移学习，联合学习和gans（生成的对抗性网络）。我们还建议使用知识引导的gan将域知识纳入培训数据生成过程中。随着大型预训练语言模型的最新进展，我们认为可以获取可用于提高知识引导生成方法的有效性的高质量知识。

The success of deep learning is largely due to the availability of large amounts of training data that cover a wide range of examples of a particular concept or meaning. In the field of medicine, having a diverse set of training data on a particular disease can lead to the development of a model that is able to accurately predict the disease. However, despite the potential benefits, there have not been significant advances in image-based diagnosis due to a lack of high-quality annotated data. This article highlights the importance of using a data-centric approach to improve the quality of data representations, particularly in cases where the available data is limited. To address this "small-data" issue, we discuss four methods for generating and aggregating training data: data augmentation, transfer learning, federated learning, and GANs (generative adversarial networks). We also propose the use of knowledge-guided GANs to incorporate domain knowledge in the training data generation process. With the recent progress in large pre-trained language models, we believe it is possible to acquire high-quality knowledge that can be used to improve the effectiveness of knowledge-guided generative methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题