论文标题
将您的内核扩展到31x31:在CNN中重新访问大型内核设计
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
论文作者
论文摘要
我们在现代卷积神经网络(CNN)中重新审视大型内核设计。受到视觉变压器(VIT)的最新进展的启发,我们证明,使用一些大型卷积内核代替一堆小内核可能是更强大的范式。我们建议使用五个指南,例如,将重新参数化的大深度卷积应用于设计有效的高性能大型内核CNN。遵循该指南,我们提出了plapknet,这是一种纯CNN结构,其内核大小与31x31一样大,与常用的3x3相比。与ImageNet上的Swin Transformer相比,Replknet极大地缩小了CNN和VIT之间的性能差距,例如,在ImageNet上获得可比或优越的结果,并且具有较低的延迟。 ReplKnet还显示出对大数据和大型模型的良好可扩展性,在ImageNet上获得了87.8%的TOP-1精度,而ADE20K上的56.0%MIOU获得了56.0%MIOU,这在具有相似型号的最新型号中非常有竞争力。我们的研究进一步表明,与小内核CNN相比,大型内核CNN具有更大的有效接受场和更高的形状偏见,而不是质地偏差。 https://github.com/megvii-research/replknet上的代码和模型。
We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few large convolutional kernels instead of a stack of small kernels could be a more powerful paradigm. We suggested five guidelines, e.g., applying re-parameterized large depth-wise convolutions, to design efficient high-performance large-kernel CNNs. Following the guidelines, we propose RepLKNet, a pure CNN architecture whose kernel size is as large as 31x31, in contrast to commonly used 3x3. RepLKNet greatly closes the performance gap between CNNs and ViTs, e.g., achieving comparable or superior results than Swin Transformer on ImageNet and a few typical downstream tasks, with lower latency. RepLKNet also shows nice scalability to big data and large models, obtaining 87.8% top-1 accuracy on ImageNet and 56.0% mIoU on ADE20K, which is very competitive among the state-of-the-arts with similar model sizes. Our study further reveals that, in contrast to small-kernel CNNs, large-kernel CNNs have much larger effective receptive fields and higher shape bias rather than texture bias. Code & models at https://github.com/megvii-research/RepLKNet.