自动评分以阅读理解，通过内在bert调整

论文标题

自动评分以阅读理解，通过内在bert调整

Automated Scoring for Reading Comprehension via In-context BERT Tuning

论文作者

Fernandez, Nigel, Ghosh, Aritra, Liu, Naiming, Wang, Zichao, Choffin, Benoît, Baraniuk, Richard, Lan, Andrew

论文摘要

开放式学生反应的自动评分有可能大大减少人类分级的努力。自动评分的最新进展通常会基于预先训练的语言模型（例如BERT和GPT）作为评分模型的输入来利用文本表示。大多数现有方法为每个项目/问题训练一个单独的模型，该模型适用于诸如论文评分之类的方案，其中项目可以彼此不同。但是，这些方法有两个局限性：1）它们无法利用项目链接来用于诸如阅读理解的方案，其中多个项目可以共享阅读段落； 2）它们是不可扩展的，因为当模型具有大量参数时，每个项目存储一个模型就会变得困难。在本文中，我们向国家教育进度评估（NAEP）自动评分挑战挑战（NAEP）为阅读理解的挑战报告了（大奖）解决方案。我们的方法是在中下文中进行微调，为所有具有精心设计的输入结构的项目提供了一个共享的评分模型，以提供每个项目的上下文信息。我们使用挑战提供的培训数据集通过本地评估来证明我们的方法的有效性。我们还讨论了我们方法的偏见，常见错误类型和局限性。

Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring often leverage textual representations based on pre-trained language models such as BERT and GPT as input to scoring models. Most existing approaches train a separate model for each item/question, which is suitable for scenarios such as essay scoring where items can be quite different from one another. However, these approaches have two limitations: 1) they fail to leverage item linkage for scenarios such as reading comprehension where multiple items may share a reading passage; 2) they are not scalable since storing one model per item becomes difficult when models have a large number of parameters. In this paper, we report our (grand prize-winning) solution to the National Assessment of Education Progress (NAEP) automated scoring challenge for reading comprehension. Our approach, in-context BERT fine-tuning, produces a single shared scoring model for all items with a carefully-designed input structure to provide contextual information on each item. We demonstrate the effectiveness of our approach via local evaluations using the training dataset provided by the challenge. We also discuss the biases, common error types, and limitations of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题