论文标题
解释脑电图信号上癫痫发作检测的深度学习模型
Interpreting Deep Learning Models for Epileptic Seizure Detection on EEG signals
论文作者
论文摘要
尽管经常将深度学习(DL)视为基于人工智能的医疗决策支持的最先进,但由于神经网络模型的解释性不足,它仍然在临床实践中稀少,临床医生信任不足。我们通过基于脑电图信号在线检测癫痫发作的背景下开发可解释的DL模型来解决这个问题。这是根据输入信号,网络体系结构以及与域知识一致的输出的后处理的准备。具体而言,我们将讨论集中在三个主要方面:1)如何将分类结果汇总到DL模型提供的信号段中,以更大的时间尺度,在癫痫发作级别; 2)在不同模型的第一个卷积层中学到的相关频率模式是什么,以及它们与Delta,Theta,Alpha,beta和Gamma频率频段的关系; 3)根据使用DeepLift方法突出显示的激活差异,对信号波形的识别具有更大的贡献。结果表明,第一层中的内核大小决定了提取特征的解释性和训练有素的模型的灵敏度,即使后处理后最终性能非常相似。另外,我们发现振幅是导致发作预测的主要特征,这表明需要更大的患者人群学习更复杂的频率模式。尽管如此,我们的方法仍然能够成功地概括了大多数研究人群的患者间变异性,分类为0.873,并检测到90%的癫痫发作。
While Deep Learning (DL) is often considered the state-of-the art for Artificial Intelligence-based medical decision support, it remains sparsely implemented in clinical practice and poorly trusted by clinicians due to insufficient interpretability of neural network models. We have tackled this issue by developing interpretable DL models in the context of online detection of epileptic seizure, based on EEG signal. This has conditioned the preparation of the input signals, the network architecture, and the post-processing of the output in line with the domain knowledge. Specifically, we focused the discussion on three main aspects: 1) how to aggregate the classification results on signal segments provided by the DL model into a larger time scale, at the seizure-level; 2) what are the relevant frequency patterns learned in the first convolutional layer of different models, and their relation with the delta, theta, alpha, beta and gamma frequency bands on which the visual interpretation of EEG is based; and 3) the identification of the signal waveforms with larger contribution towards the ictal class, according to the activation differences highlighted using the DeepLIFT method. Results show that the kernel size in the first layer determines the interpretability of the extracted features and the sensitivity of the trained models, even though the final performance is very similar after post-processing. Also, we found that amplitude is the main feature leading to an ictal prediction, suggesting that a larger patient population would be required to learn more complex frequency patterns. Still, our methodology was successfully able to generalize patient inter-variability for the majority of the studied population with a classification F1-score of 0.873 and detecting 90% of the seizures.