$ bt^2 $：与基础转换的向后兼容培训

论文标题

$ bt^2 $：与基础转换的向后兼容培训

$BT^2$: Backward-compatible Training with Basis Transformation

论文作者

Zhou, Yifei, Li, Zilu, Shrivastava, Abhinav, Zhao, Hengshuang, Torralba, Antonio, Tian, Taipeng, Lim, Ser-Nam

论文摘要

现代检索系统通常需要在更新到更好的表示模型时重新计算图库中每一个数据的表示。这个过程被称为回填，在画廊经常包含数十亿个样本的现实世界中可能尤其昂贵。最近，研究人员提出了向后兼容训练（BCT）的想法，在该想法中，新的表示模型可以通过辅助损失进行训练，以使其与旧表示形式兼容。这样，可以将新表示形式与旧表示形式直接进行比较，从原则上避免了对任何回填的需求。但是，后续工作表明，有一个固有的权衡，其中向后兼容的表示模型无法同时维护新模型本身的性能。本文报告了我们的``不太惊讶''发现，在表示形式中增加额外的维度可以在这里有所帮助。但是，我们还发现，天真地增加表示形式的维度无效。为了解决这个问题，我们通过新颖的基础转型（$ bt^2 $）提出了向后兼容的培训。基本转换（BT）基本上是一组可学习的参数集，该参数应用于正式转换。这样的转换具有重要的属性，其输入中包含的原始信息保留在其输出中。我们在本文中显示了如何仅利用BT添加必要数量的其他维度。我们从经验上验证了$ bt^2 $的优势，而不是其他最先进的方法。然后，我们进一步将$ bt^2 $扩展到其他具有挑战性，更实用的设置，包括模型体系结构（CNN到Transformers）的重大变化，模态变化，甚至模型体系结构中的一系列更新，模仿了深度学习模型的演变。

Modern retrieval system often requires recomputing the representation of every piece of data in the gallery when updating to a better representation model. This process is known as backfilling and can be especially costly in the real world where the gallery often contains billions of samples. Recently, researchers have proposed the idea of Backward Compatible Training (BCT) where the new representation model can be trained with an auxiliary loss to make it backward compatible with the old representation. In this way, the new representation can be directly compared with the old representation, in principle avoiding the need for any backfilling. However, followup work shows that there is an inherent tradeoff where a backward compatible representation model cannot simultaneously maintain the performance of the new model itself. This paper reports our ``not-so-surprising'' finding that adding extra dimensions to the representation can help here. However, we also found that naively increasing the dimension of the representation did not work. To deal with this, we propose Backward-compatible Training with a novel Basis Transformation ($BT^2$). A basis transformation (BT) is basically a learnable set of parameters that applies an orthonormal transformation. Such a transformation possesses an important property whereby the original information contained in its input is retained in its output. We show in this paper how a BT can be utilized to add only the necessary amount of additional dimensions. We empirically verify the advantage of $BT^2$ over other state-of-the-art methods in a wide range of settings. We then further extend $BT^2$ to other challenging yet more practical settings, including significant change in model architecture (CNN to Transformers), modality change, and even a series of updates in the model architecture mimicking the evolution of deep learning models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题