论文标题
剪辑:更快的数据训练速度更快
CLIP: Train Faster with Less Data
论文作者
论文摘要
深度学习模型需要大量的培训数据。但是,最近,机器学习从以模型为中心到以数据为中心的方法发生了转变。在以数据为中心的方法中,重点是完善和提高数据质量,以提高模型的学习性能,而不是重新设计模型体系结构。在本文中,我们提出了剪辑,即通过迭代数据修剪的课程学习。剪辑结合了两种以数据为中心的方法,即课程学习和数据集修剪以提高模型学习的准确性和收敛速度。所提出的方案将损失的数据集修剪来迭代删除最低的样本,并逐渐减小课程学习培训中有效数据集的大小。对人群密度估计模型进行的广泛实验通过减少收敛时间和改善概括来验证两种方法的概念。据我们所知,将数据修剪作为课程学习中的嵌入过程的概念是新颖的。
Deep learning models require an enormous amount of data for training. However, recently there is a shift in machine learning from model-centric to data-centric approaches. In data-centric approaches, the focus is to refine and improve the quality of the data to improve the learning performance of the models rather than redesigning model architectures. In this paper, we propose CLIP i.e., Curriculum Learning with Iterative data Pruning. CLIP combines two data-centric approaches i.e., curriculum learning and dataset pruning to improve the model learning accuracy and convergence speed. The proposed scheme applies loss-aware dataset pruning to iteratively remove the least significant samples and progressively reduces the size of the effective dataset in the curriculum learning training. Extensive experiments performed on crowd density estimation models validate the notion behind combining the two approaches by reducing the convergence time and improving generalization. To our knowledge, the idea of data pruning as an embedded process in curriculum learning is novel.