论文标题

部分可观测时空混沌系统的无模型预测

Improved Multi-label Classification under Temporal Concept Drift: Rethinking Group-Robust Algorithms in a Label-Wise Setting

论文作者

Chalkidis, Ilias, Søgaard, Anders

论文摘要

在例如法律和生物医学文本的文档分类中,我们经常处理数百个班级,包括非常频繁的课程,以及由现实世界事件的影响(例如政策变化,冲突或流行主义)引起的时间概念漂移。有时可以通过重新采样训练数据来模拟(或补偿)已知的目标分布来减轻类不平衡和漂移,但是如果目标分布由未知的未来事件确定,该怎么办?我们专注于用于训练此类文档分类器的基础优化算法并评估几种群体射击优化算法,而不是简单地重新采样以对冲我们的赌注,而是最初提出的用于减轻群体级别差异的基础优化算法。将群体合格算法重新标记为概念漂移中的适应算法,我们发现不变的风险最小化和光谱将频谱脱钩的方法优于基于采样的阶级失衡和概念漂移的方法,并在少数群体上带来更好的表现。标签集越大的效果更为明显。

In document classification for, e.g., legal and biomedical text, we often deal with hundreds of classes, including very infrequent ones, as well as temporal concept drift caused by the influence of real world events, e.g., policy changes, conflicts, or pandemics. Class imbalance and drift can sometimes be mitigated by resampling the training data to simulate (or compensate for) a known target distribution, but what if the target distribution is determined by unknown future events? Instead of simply resampling uniformly to hedge our bets, we focus on the underlying optimization algorithms used to train such document classifiers and evaluate several group-robust optimization algorithms, initially proposed to mitigate group-level disparities. Reframing group-robust algorithms as adaptation algorithms under concept drift, we find that Invariant Risk Minimization and Spectral Decoupling outperform sampling-based approaches to class imbalance and concept drift, and lead to much better performance on minority classes. The effect is more pronounced the larger the label set.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源