可训练的最佳运输嵌入功能聚合及其与注意力的关系

论文标题

可训练的最佳运输嵌入功能聚合及其与注意力的关系

A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention

论文作者

Mialon, Grégoire, Chen, Dexiong, d'Aspremont, Alexandre, Mairal, Julien

论文摘要

我们解决了一组功能的学习问题，这是由于需要以不同尺寸的长生物学序列执行汇总操作，具有远距离依赖性，并且可能很少有标记的数据。为了解决这一具有挑战性的任务，我们引入了固定大小的参数化表示，该表示根据设置和可训练的参考之间的最佳传输计划嵌入并汇总给定输入集中的元素。我们的方法缩放到大型数据集，并允许对参考的端到端培训，同时还提供了一个简单的无监督学习机制，并以较小的计算成本提供。我们的聚合技术承认了两种有用的解释：它可以看作是与神经网络中的注意力层相关的机制，也可以将其视为经典基于基于运输的核的可扩展替代物。我们在实验上证明了我们的方法对生物序列的有效性，实现了蛋白质折叠识别和染色质特征任务检测的最新结果，并且作为概念证明，我们为处理自然语言序列显示出令人鼓舞的结果。我们提供嵌入的开源实现，可以单独使用，也可以用作大型学习模型中的模块，网址为https://github.com/claying/otk。

We address the problem of learning on sets of features, motivated by the need of performing pooling operations in long biological sequences of varying sizes, with long-range dependencies, and possibly few labeled data. To address this challenging task, we introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost. Our aggregation technique admits two useful interpretations: it may be seen as a mechanism related to attention layers in neural networks, or it may be seen as a scalable surrogate of a classical optimal transport-based kernel. We experimentally demonstrate the effectiveness of our approach on biological sequences, achieving state-of-the-art results for protein fold recognition and detection of chromatin profiles tasks, and, as a proof of concept, we show promising results for processing natural language sequences. We provide an open-source implementation of our embedding that can be used alone or as a module in larger learning models at https://github.com/claying/OTK.

下载PDF全文

下载文献需遵守相关版权规定

论文标题