论文标题
部分可观测时空混沌系统的无模型预测
EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography
论文作者
论文摘要
学习时空特征是有效的视频理解的重要任务,尤其是在超声心动图等医学图像中。卷积神经网络(CNN)和最新的视觉变压器(VIT)是最常用的方法,每个方法都有局限性。 CNN擅长捕获本地环境,但无法在视频帧中学习全局信息。另一方面,视觉变形金刚可以结合全球细节和较长的序列,但在计算上很昂贵,通常需要更多的数据进行训练。在本文中,我们提出了一种解决我们在医学视频数据(例如超声心动图扫描)培训时通常面临的局限性的方法。我们提出的算法(Echocotr)利用视觉变压器和CNN的强度来解决超声视频上估算左心室射血分数(LVEF)的问题。我们演示了所提出的方法在Echonet-Dynamic数据集上的表现如何以3.95,$ r^2 $为0.82。与所有已发表的研究相比,这些结果显示出明显的改善。此外,我们与包括VIT和BERT在内的多种算法显示了广泛的消融和比较。该代码可在https://github.com/biomedia-mbzuai/echocotr上找到。
Learning spatiotemporal features is an important task for efficient video understanding especially in medical images such as echocardiograms. Convolutional neural networks (CNNs) and more recent vision transformers (ViTs) are the most commonly used methods with limitations per each. CNNs are good at capturing local context but fail to learn global information across video frames. On the other hand, vision transformers can incorporate global details and long sequences but are computationally expensive and typically require more data to train. In this paper, we propose a method that addresses the limitations we typically face when training on medical video data such as echocardiographic scans. The algorithm we propose (EchoCoTr) utilizes the strength of vision transformers and CNNs to tackle the problem of estimating the left ventricular ejection fraction (LVEF) on ultrasound videos. We demonstrate how the proposed method outperforms state-of-the-art work to-date on the EchoNet-Dynamic dataset with MAE of 3.95 and $R^2$ of 0.82. These results show noticeable improvement compared to all published research. In addition, we show extensive ablations and comparisons with several algorithms, including ViT and BERT. The code is available at https://github.com/BioMedIA-MBZUAI/EchoCoTr.