论文标题

S2AMP:从出版物中推断出的高覆盖范围的学术指导数据集

S2AMP: A High-Coverage Dataset of Scholarly Mentorship Inferred from Publications

论文作者

Rohatgi, Shaurya, Downey, Doug, King, Daniel, Feldman, Sergey

论文摘要

指导是学术界的关键组成部分,但不像出版物,引用,赠款和奖励那样可见。尽管研究指导的质量和影响很重要,但很少有代表性的指导数据集可用。我们为指导研究贡献了两个数据集。第一个拥有300,000多个基础真理学术导师 - 委员会对,从多种不同的,手动策划的来源获得,并与语义学者(S2)知识图相关联。我们使用此数据集训练准确的分类器,以预测书目特征的指导关系,从而在ROC曲线下实现了一个持有的区域。我们的第二个数据集是通过将分类器应用于S2的完整合作图来形成的。结果是一个推断的图,在2400万个节点中具有1.37亿加权指导边缘。我们将此初始数据集发布到社区中,以帮助加速学术指导的研究:\ url {https://github.com/allenai/s2amp-data}

Mentorship is a critical component of academia, but is not as visible as publications, citations, grants, and awards. Despite the importance of studying the quality and impact of mentorship, there are few large representative mentorship datasets available. We contribute two datasets to the study of mentorship. The first has over 300,000 ground truth academic mentor-mentee pairs obtained from multiple diverse, manually-curated sources, and linked to the Semantic Scholar (S2) knowledge graph. We use this dataset to train an accurate classifier for predicting mentorship relations from bibliographic features, achieving a held-out area under the ROC curve of 0.96. Our second dataset is formed by applying the classifier to the complete co-authorship graph of S2. The result is an inferred graph with 137 million weighted mentorship edges among 24 million nodes. We release this first-of-its-kind dataset to the community to help accelerate the study of scholarly mentorship: \url{https://github.com/allenai/S2AMP-data}

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源