论文标题
超模型生态系统:域适应性的观点
Super-model ecosystem: A domain-adaptation perspective
论文作者
论文摘要
本文试图通过域的适应来为新兴的超模型范式建立理论基础,其中首先训练一个非常大的模型{\ it i.e.},超级模型(或其他文章中的基础模型),然后大量数据,然后将其调整为各种特定领域。超模型范式有助于减少计算和数据成本和碳排放,这对AI行业至关重要,尤其是中小型企业。我们将超模型范式建模为两个阶段扩散过程:(1)在训练阶段,模型参数从随机缩写和收敛到稳定分布的扩散; (2)在微调阶段,模型参数被运输到另一个稳定分布。两个训练阶段都可以通过Uhlenbeck-ornstein过程进行数学建模,该过程分别收敛到两个Maxwell-Boltzmann分布,每个分布都表征了相应的收敛模型。然后,通过PAC-Bayesian Framework建立了$ \ Mathcal O(1/\ sqrt {n})$概括。该理论发现,微调阶段的概括误差在域适应中是主导的。此外,我们的理论表明,概括是由一种新的度量确定的,该新度量是根据协方差矩阵和融合的局部最小值的变化来表征源域和目标域之间域差异的。
This paper attempts to establish the theoretical foundation for the emerging super-model paradigm via domain adaptation, where one first trains a very large-scale model, {\it i.e.}, super model (or foundation model in some other papers), on a large amount of data and then adapts it to various specific domains. Super-model paradigms help reduce computational and data cost and carbon emission, which is critical to AI industry, especially enormous small and medium-sized enterprises. We model the super-model paradigm as a two-stage diffusion process: (1) in the pre-training stage, the model parameter diffuses from random initials and converges to a steady distribution; and (2) in the fine-tuning stage, the model parameter is transported to another steady distribution. Both training stages can be mathematically modeled by the Uhlenbeck-Ornstein process which converges to two Maxwell-Boltzmann distributions, respectively, each of which characterizes the corresponding convergent model. An $\mathcal O(1/\sqrt{N})$ generalization bound is then established via PAC-Bayesian framework. The theory finds that the generalization error of the fine-tuning stage is dominant in domain adaptation. In addition, our theory suggests that the generalization is determined by a new measure that characterizes the domain discrepancy between the source domain and target domain, based on the covariance matrices and the shift of the converged local minimum.