论文标题

通过中级预训练了解表

Understanding tables with intermediate pre-training

论文作者

Eisenschlos, Julian Martin, Krichene, Syrine, Müller, Thomas

论文摘要

表需要,即查找表内容是否支持或驳斥句子的二进制分类任务,需要解析语言和表结构以及数值和离散的推理。虽然在文本需要方面进行了广泛的工作,但桌面的研究较少。我们适应基于桌子的BERT模型(Herzig等,2020),以识别需要。在数据增强的好处的推动下,我们创建了一个平衡的数据集,其中包括数百万自动创建的培训示例,这些示例是在微调之前在中间步骤中学到的。该新数据不仅对表格的影响很有用,而且对SQA也有用(Iyyer等,2017),这是一个顺序表QA任务。为了能够将长时间的示例作为BERT模型的输入,我们将表格修剪技术评估为预处理的一步,以大大提高准确性下降的训练和预测效率。不同的方法设置了TABFACT(Chen等,2020)和SQA数据集上的新最新方法。

Table entailment, the binary classification task of finding if a sentence is supported or refuted by the content of a table, requires parsing language and table structure as well as numerical and discrete reasoning. While there is extensive work on textual entailment, table entailment is less well studied. We adapt TAPAS (Herzig et al., 2020), a table-based BERT model, to recognize entailment. Motivated by the benefits of data augmentation, we create a balanced dataset of millions of automatically created training examples which are learned in an intermediate step prior to fine-tuning. This new data is not only useful for table entailment, but also for SQA (Iyyer et al., 2017), a sequential table QA task. To be able to use long examples as input of BERT models, we evaluate table pruning techniques as a pre-processing step to drastically improve the training and prediction efficiency at a moderate drop in accuracy. The different methods set the new state-of-the-art on the TabFact (Chen et al., 2020) and SQA datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源