补充材料：基于GAU的模型的实施和实验

论文标题

补充材料：基于GAU的模型的实施和实验

Supplementary Material: Implementation and Experiments for GAU-based Model

论文作者

Liu, Zhenjie

论文摘要

今年2月，Google提出了一种名为Flash的新变压器变体，该变量的速度更快，VRAM足迹较低和性能更好。这是通过设计名为GAU（门控注意单元）的性能层来实现的，该层结合了注意力层和FFN。在本文中，某些实施细节在理论上和实际上都被重新分析。然后，我们提出了一种基于GAU的新型模型，并将其预先培训。线索基准的结果表明，我们的模型达到75.02的DEV平均得分，比Roformerv1高1％，快45％，这也与Roformerv2竞争。

In February this year Google proposed a new Transformer variant called FLASH, which has a faster speed, lower VRAM footprint and better performance. This is achieved by designing a performant layer named GAU (Gated Attention Unit), which combines the Attention layer and FFN. In this paper, some implementation details are re-analyzed both theoretically and practically. We then propose a novel GAU-based model and pre-train it on a Chinese corpus. Results of the CLUE benchmark show that our model achieves a dev average score of 75.02, 1% higher than RoFormerV1 and being 45% faster, which is also competitive with RoFormerV2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题