论文标题
使用暹罗网络和积极学习的数据有效的在线分类
Data-efficient Online Classification with Siamese Networks and Active Learning
论文作者
论文摘要
如今,在许多应用领域,在关键基础架构系统中,金融和银行,安全与犯罪和网络分析中,以流量的方式可用的数据量越来越大。为了满足这一新需求,需要在网上建立学习的预测模型,而学习会直接进行学习。在线学习提出了重要的挑战,这些挑战影响在线分类系统到现实生活中的问题。在本文中,我们研究了在线分类中有限标记,非平稳性和不平衡数据的学习。我们提出了一种协同结合暹罗神经网络和积极学习的学习方法。所提出的方法使用多滑式窗口方法来存储数据,并为每个类维护单独且平衡的队列。我们的研究表明,所提出的方法对数据非平稳性和不平衡是鲁棒的,并且在学习速度和性能方面都显着优于基准和最先进的算法。重要的是,即使只有到达实例的标签中只有1%的标签也有效。
An ever increasing volume of data is nowadays becoming available in a streaming manner in many application areas, such as, in critical infrastructure systems, finance and banking, security and crime and web analytics. To meet this new demand, predictive models need to be built online where learning occurs on-the-fly. Online learning poses important challenges that affect the deployment of online classification systems to real-life problems. In this paper we investigate learning from limited labelled, nonstationary and imbalanced data in online classification. We propose a learning method that synergistically combines siamese neural networks and active learning. The proposed method uses a multi-sliding window approach to store data, and maintains separate and balanced queues for each class. Our study shows that the proposed method is robust to data nonstationarity and imbalance, and significantly outperforms baselines and state-of-the-art algorithms in terms of both learning speed and performance. Importantly, it is effective even when only 1% of the labels of the arriving instances are available.