论文标题
EDUQG:教育领域的多格式多种选择数据集
EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain
论文作者
论文摘要
我们介绍了一个高质量的数据集,其中包含3,397个样本,其中包括(i)多项选择问题,(ii)答案(包括干扰器)和(iii)来自教育领域的源文档。每个问题都以两种形式措辞,正常和关闭。正确的答案链接到带有句子级注释的源文档。因此,我们的多功能数据集可用于问题和分散分散者的生成,并探索诸如问题格式转换之类的新挑战。此外,根据Bloom的分类法,有903个问题伴随着其认知复杂性水平。所有问题都是由教育专家而不是人群工人产生的,以确保他们保持教育和学习标准。我们的分析和实验表明,出于教育目的,我们的数据集和常用的数据集之间存在明显的差异。我们认为,这个新数据集可以作为教育领域研究和评估的宝贵资源。数据集和基准将被发布以支持有关生成的进一步研究。
We introduce a high-quality dataset that contains 3,397 samples comprising (i) multiple choice questions, (ii) answers (including distractors), and (iii) their source documents, from the educational domain. Each question is phrased in two forms, normal and close. Correct answers are linked to source documents with sentence-level annotations. Thus, our versatile dataset can be used for both question and distractor generation, as well as to explore new challenges such as question format conversion. Furthermore, 903 questions are accompanied by their cognitive complexity level as per Bloom's taxonomy. All questions have been generated by educational experts rather than crowd workers to ensure they are maintaining educational and learning standards. Our analysis and experiments suggest distinguishable differences between our dataset and commonly used ones for question generation for educational purposes. We believe this new dataset can serve as a valuable resource for research and evaluation in the educational domain. The dataset and baselines will be released to support further research in question generation.