论文标题
RumedBench:俄罗斯医学语言理解基准
RuMedBench: A Russian Medical Language Understanding Benchmark
论文作者
论文摘要
本文描述了许多新的文本集,涵盖了几种任务类型(分类,问题回答,自然语言识别,命名实体识别)的开放式俄罗斯医学语言理解。鉴于数据在医疗保健中的敏感性,这种基准部分结束了俄罗斯医疗数据集缺失的问题。我们为新任务准备统一格式标签,数据拆分和评估指标。其余任务来自经过一些修改的现有数据集。单数度量表达了模型应对基准测试的能力。此外,我们实施了几种基线模型,从简单的模型到具有变压器体系结构的神经网络,然后发布代码。预计,更高级的模型会产生更好的性能,但是即使是简单的模型也足以完成某些任务的体面结果。此外,对于所有任务,我们提供人类评估。有趣的是,这些模型在大规模分类任务中的表现优于人类。但是,自然智力的优势仍然是需要更多知识和推理的任务。
The paper describes the open Russian medical language understanding benchmark covering several task types (classification, question answering, natural language inference, named entity recognition) on a number of novel text sets. Given the sensitive nature of the data in healthcare, such a benchmark partially closes the problem of Russian medical dataset absence. We prepare the unified format labeling, data split, and evaluation metrics for new tasks. The remaining tasks are from existing datasets with a few modifications. A single-number metric expresses a model's ability to cope with the benchmark. Moreover, we implement several baseline models, from simple ones to neural networks with transformer architecture, and release the code. Expectedly, the more advanced models yield better performance, but even a simple model is enough for a decent result in some tasks. Furthermore, for all tasks, we provide a human evaluation. Interestingly the models outperform humans in the large-scale classification tasks. However, the advantage of natural intelligence remains in the tasks requiring more knowledge and reasoning.