论文标题

通过条件gan生成多标签的临床时间序列

Multi-Label Clinical Time-Series Generation via Conditional GAN

论文作者

Lu, Chang, Reddy, Chandan K., Wang, Ping, Nie, Dong, Ning, Yue

论文摘要

近年来,深度学习已在与电子健康记录(EHR)(例如代表性学习和临床事件预测)相关的广泛应用中成功采用。但是,由于隐私的限制,对EHR的访问有限成为深度学习研究的瓶颈。为了减轻这些问题,生成的对抗网络(GAN)已成功用于生成EHR数据。但是,高质量的EHR生成中仍然存在挑战,包括生成时间序列EHR数据和不平衡的罕见疾病。在这项工作中,我们提出了一个多标签的时间序列GAN(MTGAN)来产生EHR并同时提高不常见疾病的产生质量。 MTGAN的发电机使用带有光滑条件矩阵的封闭式复发单元(GRU)来生成序列和不常见的疾病。评论家使用Wasserstein距离给出了分数,以通过考虑数据和时间特征来识别合成样本中的真实样本。我们还提出了一种培训策略,以计算真实数据的时间特征并稳定GAN培训。此外,我们设计了多个统计指标和预测任务,以评估生成的数据。实验结果证明了合成数据的质量以及MTGAN在生成现实的顺序EHR数据中的有效性,尤其是对于不常见的疾病。

In recent years, deep learning has been successfully adopted in a wide range of applications related to electronic health records (EHRs) such as representation learning and clinical event prediction. However, due to privacy constraints, limited access to EHR becomes a bottleneck for deep learning research. To mitigate these concerns, generative adversarial networks (GANs) have been successfully used for generating EHR data. However, there are still challenges in high-quality EHR generation, including generating time-series EHR data and imbalanced uncommon diseases. In this work, we propose a Multi-label Time-series GAN (MTGAN) to generate EHR and simultaneously improve the quality of uncommon disease generation. The generator of MTGAN uses a gated recurrent unit (GRU) with a smooth conditional matrix to generate sequences and uncommon diseases. The critic gives scores using Wasserstein distance to recognize real samples from synthetic samples by considering both data and temporal features. We also propose a training strategy to calculate temporal features for real data and stabilize GAN training. Furthermore, we design multiple statistical metrics and prediction tasks to evaluate the generated data. Experimental results demonstrate the quality of the synthetic data and the effectiveness of MTGAN in generating realistic sequential EHR data, especially for uncommon diseases.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源