对中心的评估和评估：数据和模型测量的最佳实践

论文标题

对中心的评估和评估：数据和模型测量的最佳实践

Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements

论文作者

von Werra, Leandro, Tunstall, Lewis, Thakur, Abhishek, Luccioni, Alexandra Sasha, Thrush, Tristan, Piktus, Aleksandra, Marty, Felix, Rajani, Nazneen, Mustar, Victor, Ngo, Helen, Sanseviero, Omar, Šaško, Mario, Villanova, Albert, Lhoest, Quentin, Chaumond, Julien, Mitchell, Margaret, Rush, Alexander M., Wolf, Thomas, Kiela, Douwe

论文摘要

评估是机器学习（ML）的关键部分，但是缺乏支持和工具来实现其知情和系统的实践。我们在集线器上介绍和评估 - 一种工具，以促进ML中模型和数据集的评估。评估是一个库，以支持数据和模型的测量，指标和比较的最佳实践。它的目标是支持评估，集中和记录评估过程的可重复性，并扩大评估以涵盖模型性能的更多方面。它包括针对各种域和场景，交互式文档以及轻松共享实现和结果的能力的50多种高效规范实现。该库可在https://github.com/huggingface/evaluate上找到。此外，我们在集线器上介绍了评估，该平台可以通过单击按钮免费评估拥抱面枢纽的75,000多个型号和11,000个数据集。可以在https://huggingface.co/autoevaluate上获得有关集线器的评估。

Evaluation is a key part of machine learning (ML), yet there is a lack of support and tooling to enable its informed and systematic practice. We introduce Evaluate and Evaluation on the Hub --a set of tools to facilitate the evaluation of models and datasets in ML. Evaluate is a library to support best practices for measurements, metrics, and comparisons of data and models. Its goal is to support reproducibility of evaluation, centralize and document the evaluation process, and broaden evaluation to cover more facets of model performance. It includes over 50 efficient canonical implementations for a variety of domains and scenarios, interactive documentation, and the ability to easily share implementations and outcomes. The library is available at https://github.com/huggingface/evaluate. In addition, we introduce Evaluation on the Hub, a platform that enables the large-scale evaluation of over 75,000 models and 11,000 datasets on the Hugging Face Hub, for free, at the click of a button. Evaluation on the Hub is available at https://huggingface.co/autoevaluate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题