论文标题
贝叶斯主动学习框架内的激光雷达数据集蒸馏:了解数据的影响
LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation
论文作者
论文摘要
在过去的几年中,自动驾驶(AD)数据集的规模逐渐增长,以实现更好的深度表示学习。积极学习(AL)最近重新引起了人们的关注,以解决降低注释成本和数据集大小。对于AD数据集,AL仍然相对尚未探索,尤其是在LiDARS的点云数据上。本文对大型语义-KITTI数据集的(1/4)进行了基于Al的数据集蒸馏的原则评估。此外,在AL循环的不同子集中证明了由于数据增强(DA)而引起的模型性能的提高。我们还展示了DA如何改善供应信息的选择。我们观察到,数据扩展仅使用所选数据集配置中的60%的样本实现完整的数据集精度。这提供了更快的培训时间和随后的注释成本。
Autonomous driving (AD) datasets have progressively grown in size in the past few years to enable better deep representation learning. Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size. AL has remained relatively unexplored for AD datasets, especially on point cloud data from LiDARs. This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset. Further on, the gains in model performance due to data augmentation (DA) are demonstrated across different subsets of the AL loop. We also demonstrate how DA improves the selection of informative samples to annotate. We observe that data augmentation achieves full dataset accuracy using only 60\% of samples from the selected dataset configuration. This provides faster training time and subsequent gains in annotation costs.