部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again

论文作者

Jaiswal, Ajay, Wang, Peihao, Chen, Tianlong, Rousseau, Justin F., Ding, Ying, Wang, Zhangyang

论文摘要

尽管图形卷积网络（GCN）在建模图形结构数据方面取得了巨大成功，但由于臭名昭著的过度平滑和信息挤压的问题以及由于消失的梯度和夸张而引起的常规难度，当前的大多数GCN都很浅。先前的工作主要集中在训练深GCN中的过度平滑和过度阵列现象的研究上。出乎意料的是，与CNN/RNN相比，人们对了解健康梯度流动如何有益于深GCN的训练性的关注非常有限。在本文中，首先，我们提供了梯度流的新观点，以了解深GCN的不合格性能，并假设通过促进健康的梯度流，我们可以显着提高其训练性，并实现Vanilla-GCNS的最先进（SOTA）水平的性能。接下来，我们认为，盲目地采用GCN的Glorot初始化不是最佳的，并且根据等轴测学原理为Vanilla-GCNS提供了一种拓扑感知的等距初始化方案。此外，与临时添加跳过连接相反，我们建议使用跳过连接的梯度引导的香草-GCNS}动态重新布线。我们的动态重新布线方法在训练过程中使用每一层内的梯度流，以适应按需跳过。我们在多个数据集中提供了广泛的经验证据，表明我们的方法改善了深度香草-GCN中的梯度流，并显着提高了其性能，以舒适地竞争并表现优于许多奇特的最先进方法。代码可在以下网址提供：https：//github.com/vita-group/gradientgcn。

Despite the enormous success of Graph Convolutional Networks (GCNs) in modeling graph-structured data, most of the current GCNs are shallow due to the notoriously challenging problems of over-smoothening and information squashing along with conventional difficulty caused by vanishing gradients and over-fitting. Previous works have been primarily focused on the study of over-smoothening and over-squashing phenomena in training deep GCNs. Surprisingly, in comparison with CNNs/RNNs, very limited attention has been given to understanding how healthy gradient flow can benefit the trainability of deep GCNs. In this paper, firstly, we provide a new perspective of gradient flow to understand the substandard performance of deep GCNs and hypothesize that by facilitating healthy gradient flow, we can significantly improve their trainability, as well as achieve state-of-the-art (SOTA) level performance from vanilla-GCNs. Next, we argue that blindly adopting the Glorot initialization for GCNs is not optimal, and derive a topology-aware isometric initialization scheme for vanilla-GCNs based on the principles of isometry. Additionally, contrary to ad-hoc addition of skip-connections, we propose to use gradient-guided dynamic rewiring of vanilla-GCNs} with skip connections. Our dynamic rewiring method uses the gradient flow within each layer during training to introduce on-demand skip-connections adaptively. We provide extensive empirical evidence across multiple datasets that our methods improve gradient flow in deep vanilla-GCNs and significantly boost their performance to comfortably compete and outperform many fancy state-of-the-art methods. Codes are available at: https://github.com/VITA-Group/GradientGCN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题