CONENTAIL：基于元素的基于零的框架和很少的射击分类，并有监督的训练预处理

论文标题

CONENTAIL：基于元素的基于零的框架和很少的射击分类，并有监督的训练预处理

ConEntail: An Entailment-based Framework for Universal Zero and Few Shot Classification with Supervised Contrastive Pretraining

论文作者

Zhang, Ranran Haoran, Fan, Aysa Xuemo, Zhang, Rui

论文摘要

通用分类模型旨在将零和少数射击设置中的各种分类任务推广。通用分类的一种有希望的方法是将异质数据格式施加到数据集 - 不合时宜的“元任务”（例如，文本需要，问题回答），然后在合并的元数据集中预告一个模型。现有的工作要么是在分类任务的特定子集上鉴定的，要么在分类和发电数据上鉴定，但是该模型无法满足其在普遍性和可靠性中的潜力。这些还留下大量注释的数据探索。为了填补这些空白，我们提出了Conentail，Conentail是通用零的新框架，几乎没有受到监督的对比预处理。我们用于分类的统一元任务是基于嵌套的。它可以解释为“句子需要[句子b需要标签c]”。这种公式使我们能够更好地利用57个注释分类数据集进行监督的对比预处理和普遍评估。通过这种方式，Conentail有助于模型（1）从不同数据集中吸收知识，以及（2）通过更多预读的数据获得一致的性能增益。在实验中，我们将模型与在同一数据集上预处理的判别和生成模型进行了比较。结果证实，我们的框架有效利用了现有的注释数据，并且在零（平均改善9.4％）和很少的射击设置（平均改善3.5％）的情况下始终优于基准。

A universal classification model aims to generalize to diverse classification tasks in both zero and few shot settings. A promising way toward universal classification is to cast heterogeneous data formats into a dataset-agnostic "meta-task" (e.g., textual entailment, question answering) then pretrain a model on the combined meta dataset. The existing work is either pretrained on specific subsets of classification tasks, or pretrained on both classification and generation data but the model could not fulfill its potential in universality and reliability. These also leave a massive amount of annotated data under-exploited. To fill these gaps, we propose ConEntail, a new framework for universal zero and few shot classification with supervised contrastive pretraining. Our unified meta-task for classification is based on nested entailment. It can be interpreted as "Does sentence a entails [sentence b entails label c]". This formulation enables us to make better use of 57 annotated classification datasets for supervised contrastive pretraining and universal evaluation. In this way, ConEntail helps the model (1) absorb knowledge from different datasets, and (2) gain consistent performance gain with more pretraining data. In experiments, we compare our model with discriminative and generative models pretrained on the same dataset. The results confirm that our framework effectively exploits existing annotated data and consistently outperforms baselines in both zero (9.4% average improvement) and few shot settings (3.5% average improvement).

下载PDF全文

下载文献需遵守相关版权规定

论文标题