论文标题
MTEB:大量文本嵌入基准
MTEB: Massive Text Embedding Benchmark
论文作者
论文摘要
通常在一小组数据集上评估文本嵌入,而不是涵盖其可能的应用程序到其他任务的单个任务。目前尚不清楚语义文本相似性(STS)是否可以很好地应用于其他任务,例如聚类或重新疗法。这使得难以跟踪领域的进度,因为没有适当评估的情况下不断提出各种模型。为了解决这个问题,我们介绍了大量的文本嵌入基准(MTEB)。 MTEB跨越8个嵌入任务,总共涵盖58个数据集和112种语言。通过MTEB上33个模型的基准测试,我们建立了迄今为止文本嵌入的最全面的基准。我们发现,没有特定的文本嵌入方法在所有任务中占主导地位。这表明该领域尚未在通用文本嵌入方法上汇合并进行足够的扩展以提供所有嵌入任务的最新结果。 MTEB在https://github.com/embeddings-benchmark/mteb附带开源代码和公共排行榜。
Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding Benchmark (MTEB). MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages. Through the benchmarking of 33 models on MTEB, we establish the most comprehensive benchmark of text embeddings to date. We find that no particular text embedding method dominates across all tasks. This suggests that the field has yet to converge on a universal text embedding method and scale it up sufficiently to provide state-of-the-art results on all embedding tasks. MTEB comes with open-source code and a public leaderboard at https://github.com/embeddings-benchmark/mteb.