目标意识网络架构搜索和压缩以进行有效的知识转移

论文标题

目标意识网络架构搜索和压缩以进行有效的知识转移

Target Aware Network Architecture Search and Compression for Efficient Knowledge Transfer

论文作者

Basha, S. H. Shabbeer, Tula, Debapriya, Vinakota, Sravan Kumar, Dubey, Shiv Ram

论文摘要

转移学习使卷积神经网络（CNN）能够从源域中获取知识并将其转移到目标域，在这些域中收集大规模注释的示例是耗时且昂贵的。通常，在将所学知识从一个任务转移到另一个任务的同时，在目标数据集中对预训练的CNN的更深层进行了填充。但是，这些层最初是为源任务而设计的，该任务可能会过度参数用于目标任务。因此，在目标数据集上对这些层进行填充可能会影响CNN的概括能力，这是由于高网络复杂性而引起的。为了解决这个问题，我们提出了一个称为TASCNET的两阶段框架，该框架可以有效地传递知识转移。在第一阶段，较深的图层的配置会自动学习并在目标数据集上进行了填充。后来，在第二阶段，从微调的CNN修剪了冗余过滤器，以降低网络对目标任务的复杂性，同时保留性能。这种两阶段的机制从假设空间中找到了具有最佳结构（卷积层中的过滤器，密集层中的神经元的数量）的预训练的CNN的紧凑版本。使用VGG-16，Resnet-50和Densenet-121在CalTech-101，CalTech-256和Stanford Dogs DataSet上评估所提出方法的功效。与计算机视觉任务相似，我们还进行了有关电影复习情感分析任务的实验。拟议的TASCNET通过减少可训练的参数和拖船，从而降低了预训练的CNN对目标任务的计算复杂性，从而实现了资源有效的知识转移。源代码可在以下网址获得：https：//github.com/debapriya-tula/tascnet。

Transfer Learning enables Convolutional Neural Networks (CNN) to acquire knowledge from a source domain and transfer it to a target domain, where collecting large-scale annotated examples is time-consuming and expensive. Conventionally, while transferring the knowledge learned from one task to another task, the deeper layers of a pre-trained CNN are finetuned over the target dataset. However, these layers are originally designed for the source task which may be over-parameterized for the target task. Thus, finetuning these layers over the target dataset may affect the generalization ability of the CNN due to high network complexity. To tackle this problem, we propose a two-stage framework called TASCNet which enables efficient knowledge transfer. In the first stage, the configuration of the deeper layers is learned automatically and finetuned over the target dataset. Later, in the second stage, the redundant filters are pruned from the fine-tuned CNN to decrease the network's complexity for the target task while preserving the performance. This two-stage mechanism finds a compact version of the pre-trained CNN with optimal structure (number of filters in a convolutional layer, number of neurons in a dense layer, and so on) from the hypothesis space. The efficacy of the proposed method is evaluated using VGG-16, ResNet-50, and DenseNet-121 on CalTech-101, CalTech-256, and Stanford Dogs datasets. Similar to computer vision tasks, we have also conducted experiments on Movie Review Sentiment Analysis task. The proposed TASCNet reduces the computational complexity of pre-trained CNNs over the target task by reducing both trainable parameters and FLOPs which enables resource-efficient knowledge transfer. The source code is available at: https://github.com/Debapriya-Tula/TASCNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题