论文标题
Cascade Transformer:有效答案句子选择的应用程序
The Cascade Transformer: an Application for Efficient Answer Sentence Selection
论文作者
论文摘要
大型基于变压器的语言模型已显示在许多分类任务中非常有效。但是,它们的计算复杂性阻止了它们在需要分类大量候选人的应用中的使用。尽管以前的工作已经调查了减少模型大小的方法,但对在推理过程中提高批次吞吐量的技术的关注很少。在本文中,我们介绍了Cascade Transformer,这是一种简单而有效的技术,可以将基于变压器的模型调整为排名级别的级联。每个排名者都用于修剪批处理中的一部分候选物,从而在推理时大大增加了吞吐量。从变压器模型中的部分编码在Rerankers之间共享,从而进一步加速。与最先进的变压器模型相比,我们的方法将计算减少了37%,几乎没有对准确性的影响,这是在两个英语问题回答数据集中衡量的。
Large transformer-based language models have been shown to be very effective in many classification tasks. However, their computational complexity prevents their use in applications requiring the classification of a large set of candidates. While previous works have investigated approaches to reduce model size, relatively little attention has been paid to techniques to improve batch throughput during inference. In this paper, we introduce the Cascade Transformer, a simple yet effective technique to adapt transformer-based models into a cascade of rankers. Each ranker is used to prune a subset of candidates in a batch, thus dramatically increasing throughput at inference time. Partial encodings from the transformer model are shared among rerankers, providing further speed-up. When compared to a state-of-the-art transformer model, our approach reduces computation by 37% with almost no impact on accuracy, as measured on two English Question Answering datasets.