论文标题

自动制药新闻分类

Automatic Pharma News Categorization

论文作者

Adaszewski, Stanislaw, Kuner, Pascal, Jaeger, Ralf J.

论文摘要

我们使用由与制药信息科学相关的23个新闻类别组成的文本数据集,以比较分类任务中多个变压器模型的微调性能。使用具有多个自动回归和自动编码转换模型的均衡数据集,我们比较它们的微调性能。为了验证获胜方法,我们在错误预测的实例上对模型行为进行诊断,包括检查类别指标,评估预测确定性和对潜在空间表示的评估。最后,我们提出了一个由表现最佳的单个预测指标组成的集成模型,并证明了这种方法对F1度量的改进提供了适度的改进。

We use a text dataset consisting of 23 news categories relevant to pharma information science, in order to compare the fine-tuning performance of multiple transformer models in a classification task. Using a well-balanced dataset with multiple autoregressive and autocoding transformation models, we compare their fine-tuning performance. To validate the winning approach, we perform diagnostics of model behavior on mispredicted instances, including inspection of category-wise metrics, evaluation of prediction certainty and assessment of latent space representations. Lastly, we propose an ensemble model consisting of the top performing individual predictors and demonstrate that this approach offers a modest improvement in the F1 metric.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源