论文标题
补充材料:基于GAU的模型的实施和实验
Supplementary Material: Implementation and Experiments for GAU-based Model
论文作者
论文摘要
今年2月,Google提出了一种名为Flash的新变压器变体,该变量的速度更快,VRAM足迹较低和性能更好。这是通过设计名为GAU(门控注意单元)的性能层来实现的,该层结合了注意力层和FFN。在本文中,某些实施细节在理论上和实际上都被重新分析。然后,我们提出了一种基于GAU的新型模型,并将其预先培训。线索基准的结果表明,我们的模型达到75.02的DEV平均得分,比Roformerv1高1%,快45%,这也与Roformerv2竞争。
In February this year Google proposed a new Transformer variant called FLASH, which has a faster speed, lower VRAM footprint and better performance. This is achieved by designing a performant layer named GAU (Gated Attention Unit), which combines the Attention layer and FFN. In this paper, some implementation details are re-analyzed both theoretically and practically. We then propose a novel GAU-based model and pre-train it on a Chinese corpus. Results of the CLUE benchmark show that our model achieves a dev average score of 75.02, 1% higher than RoFormerV1 and being 45% faster, which is also competitive with RoFormerV2.