一击域自适应和可推广的语义分割，具有类吸引的跨域变压器

论文标题

一击域自适应和可推广的语义分割，具有类吸引的跨域变压器

One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers

论文作者

Gong, Rui, Wang, Qin, Dai, Dengxin, Van Gool, Luc

论文摘要

语义分割的无监督的SIM到运行域适应性（UDA）旨在提高对模拟数据训练的模型的现实测试性能。它可以节省在实际应用程序（例如机器人视觉和自动驾驶）中手动标记数据的成本。传统的UDA经常假设在适应培训期间提供了丰富的未标记现实数据样本。但是，由于收集难度和数据的稀缺，这种假设并不总是存在于实践中。因此，我们的目标是在大量的实际数据上缓解这一需求，并探索一击无监督的SIM到运行域的适应性（OSUDA）和概括（OSDG）问题，其中只有一个现实世界中的数据样本可用。为了纠正有限的真实数据知识，我们首先通过使用单次真实数据对模拟数据进行样式构建伪目标域。为了减轻样式和空间结构水平上的SIM到现实域间隙，并促进SIM卡对实现的适应，我们进一步建议使用具有中间域随机策略的类吸引的跨域变压器从模拟和伪造标准数据中提取域侵入式知识。我们证明了我们对Osuda和OSDG方法在不同基准测试中的有效性，从而超过了最先进的方法，大幅度的10.87、9.59、13.05和15.91 MIOU在GTA上，Synthia $ \ rightarrow $ rightArrow $ CityScapes，Foggy CityScapes。

Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. It can save the cost of manually labeling data in real-world applications such as robot vision and autonomous driving. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation. However, such an assumption does not always hold in practice owing to the collection difficulty and the scarcity of the data. Thus, we aim to relieve this need on a large number of real data, and explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization (OSDG) problem, where only one real-world data sample is available. To remedy the limited real data knowledge, we first construct the pseudo-target domain by stylizing the simulated data with the one-shot real data. To mitigate the sim-to-real domain gap on both the style and spatial structure level and facilitate the sim-to-real adaptation, we further propose to use class-aware cross-domain transformers with an intermediate domain randomization strategy to extract the domain-invariant knowledge, from both the simulated and pseudo-target data. We demonstrate the effectiveness of our approach for OSUDA and OSDG on different benchmarks, outperforming the state-of-the-art methods by a large margin, 10.87, 9.59, 13.05 and 15.91 mIoU on GTA, SYNTHIA$\rightarrow$Cityscapes, Foggy Cityscapes, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题