论文标题

通过动态课程学习改善不平衡的文本分类

Improving Imbalanced Text Classification with Dynamic Curriculum Learning

论文作者

Zhang, Xulong, Wang, Jianzong, Cheng, Ning, Xiao, Jing

论文摘要

预训练的语言模型的最新进展改善了文本分类任务的性能。但是,在培训期间,对样本的优先调度策略几乎没有关注。人类逐渐从易于到复杂的概念中获取知识,而同一材料的难度在不同的学习阶段也可能有很大差异。受这些见解的启发,我们提出了一种新颖的自定进度动态课程学习(SPDCL)方法,以实现不平衡的文本分类,该方法通过语言特征和模型能力评估了样本难度。同时,我们的SPDCL不像在现有研究中那样使用静态课程学习,而是可以通过难度标准重新订购和重新采样培训数据,并从易于到硬步调的适应性。关于多个分类任务的广泛实验显示了SPDCL策略的有效性,尤其是对于不平衡的数据集。

Recent advances in pre-trained language models have improved the performance for text classification tasks. However, little attention is paid to the priority scheduling strategy on the samples during training. Humans acquire knowledge gradually from easy to complex concepts, and the difficulty of the same material can also vary significantly in different learning stages. Inspired by this insights, we proposed a novel self-paced dynamic curriculum learning (SPDCL) method for imbalanced text classification, which evaluates the sample difficulty by both linguistic character and model capacity. Meanwhile, rather than using static curriculum learning as in the existing research, our SPDCL can reorder and resample training data by difficulty criterion with an adaptive from easy to hard pace. The extensive experiments on several classification tasks show the effectiveness of SPDCL strategy, especially for the imbalanced dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源