Tabllm：用大语言模型对表格数据进行的几乎没有分类

论文标题

Tabllm：用大语言模型对表格数据进行的几乎没有分类

TabLLM: Few-shot Classification of Tabular Data with Large Language Models

论文作者

Hegselmann, Stefan, Buendia, Alejandro, Lang, Hunter, Agrawal, Monica, Jiang, Xiaoyi, Sontag, David

论文摘要

我们研究了大语言模型在零摄像中的应用，并且表格数据的少量分类。我们促使大型语言模型将表格数据序列化为自然语言字符串，以及对分类问题的简短描述。在几个弹奏的设置中，我们使用一些标记的示例微调大语言模型。我们评估了几种序列化方法，包括模板，桌面模型和大型语言模型。尽管它很简单，但我们发现该技术的表现优于几个基准数据集上的基于深度学习的表格分类方法。在大多数情况下，即使是零射门的分类也会获得非平凡的性能，这说明了该方法利用大语模型中编码的先验知识的能力。与许多对表格数据集的深度学习方法不同，这种方法也具有强大的传统基线（如梯度增强的树木）的竞争，尤其是在非常快速的环境中。

We study the application of large language models to zero-shot and few-shot classification of tabular data. We prompt the large language model with a serialization of the tabular data to a natural-language string, together with a short description of the classification problem. In the few-shot setting, we fine-tune the large language model using some labeled examples. We evaluate several serialization methods including templates, table-to-text models, and large language models. Despite its simplicity, we find that this technique outperforms prior deep-learning-based tabular classification methods on several benchmark datasets. In most cases, even zero-shot classification obtains non-trivial performance, illustrating the method's ability to exploit prior knowledge encoded in large language models. Unlike many deep learning methods for tabular datasets, this approach is also competitive with strong traditional baselines like gradient-boosted trees, especially in the very-few-shot setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题