论文标题

部分可观测时空混沌系统的无模型预测

Multi-Modal Unsupervised Pre-Training for Surgical Operating Room Workflow Analysis

论文作者

Jamal, Muhammad Abdullah, Mohareri, Omid

论文摘要

数据驱动的方法来协助手术室(OR)工作流程分析取决于耗时且收集昂贵的大型策划数据集。另一方面,我们看到了最近从监督的学习转变为可以从未标记的数据集中学习表示表示的自我监督和/或无监督的学习方法。在本文中,我们利用机器人手术中捕获的未标记数据,并提出了一种新颖的方法,以融合单个视频框架或图像的多模式数据。我们没有在自学学习中产生相同图像或视频框架的不同增强(或“视图”),而是将多模式数据视为不同的观点,可以通过聚类以无监督的方式训练模型。我们将我们的方法与其他最先进的方法进行了比较,结果表明,我们的方法在手术视频活动识别和语义细分方面的出色表现。

Data-driven approaches to assist operating room (OR) workflow analysis depend on large curated datasets that are time consuming and expensive to collect. On the other hand, we see a recent paradigm shift from supervised learning to self-supervised and/or unsupervised learning approaches that can learn representations from unlabeled datasets. In this paper, we leverage the unlabeled data captured in robotic surgery ORs and propose a novel way to fuse the multi-modal data for a single video frame or image. Instead of producing different augmentations (or 'views') of the same image or video frame which is a common practice in self-supervised learning, we treat the multi-modal data as different views to train the model in an unsupervised manner via clustering. We compared our method with other state of the art methods and results show the superior performance of our approach on surgical video activity recognition and semantic segmentation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源