自动致密度注释大型墨西哥

论文标题

自动致密度注释大型墨西哥

Automatic dense annotation of large-vocabulary sign language videos

论文作者

Momeni, Liliane, Bull, Hannah, Prajwal, K R, Albanie, Samuel, Varol, Gül, Zisserman, Andrew

论文摘要

最近，手语研究人员已转向手语解释的电视广播，其中包括（i）连续签名和（ii）与音频内容相对应的字幕视频，作为易于使用的大规模培训数据来源。此类数据可用性的一个主要挑战是缺乏标志注释。利用这种弱对准数据的以前的工作仅发现字幕中的关键字与单个符号之间的稀疏对应关系。在这项工作中，我们提出了一个简单，可扩展的框架，以极大地增加自动注释的密度。我们的贡献如下：（1）我们通过使用同义词和字幕签名对齐来显着改善先前的注释方法；（2）我们将标志识别模型中的伪标记的价值作为标志发现的方式；（3）我们提出了一种新的方法，以增加基于内域示例的已知和未知类别的注释；（4）在Bobsl BSL手语语料库上，我们将自信自动注释的数量从670K增加到5m。我们将这些注释公开用于支持手语研究社区。

Recently, sign language researchers have turned to sign language interpreted TV broadcasts, comprising (i) a video of continuous signing and (ii) subtitles corresponding to the audio content, as a readily available and large-scale source of training data. One key challenge in the usability of such data is the lack of sign annotations. Previous work exploiting such weakly-aligned data only found sparse correspondences between keywords in the subtitle and individual signs. In this work, we propose a simple, scalable framework to vastly increase the density of automatic annotations. Our contributions are the following: (1) we significantly improve previous annotation methods by making use of synonyms and subtitle-signing alignment; (2) we show the value of pseudo-labelling from a sign recognition model as a way of sign spotting; (3) we propose a novel approach for increasing our annotations of known and unknown classes based on in-domain exemplars; (4) on the BOBSL BSL sign language corpus, we increase the number of confident automatic annotations from 670K to 5M. We make these annotations publicly available to support the sign language research community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题