论文标题

多语言bert中的跨语言句法差异:它有多好,它如何影响转移?

Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is It and How Does It Affect Transfer?

论文作者

Xu, Ningyu, Gui, Tao, Ma, Ruotian, Zhang, Qi, Ye, Jingting, Zhang, Menghan, Huang, Xuanjing

论文摘要

多语言伯特(Mbert)表现出相当大的跨语性句法能力,从而可以使句法知识有效地进行零击的跨语性转移。在某些语言之间的转移更为成功,但是尚不清楚导致这种变化的原因以及它是否公平地反映了语言之间的差异。在这项工作中,我们研究了在24种类型上不同语言的背景下,麦伯特引起的语法关系的分布。我们证明,不同语言的分布之间的距离与语言形式主义方面的句法差异高度一致。通过自学学历学到的这种差异在零拍传递性能中起着至关重要的作用,可以通过语言之间的形态句法特性的变化来预测。这些结果表明,Mbert以与语言多样性一致的方式正确编码语言,并提供了跨语性转移机制的见解。

Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability, whereby it enables effective zero-shot cross-lingual transfer of syntactic knowledge. The transfer is more successful between some languages, but it is not well understood what leads to this variation and whether it fairly reflects difference between languages. In this work, we investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages. We demonstrate that the distance between the distributions of different languages is highly consistent with the syntactic difference in terms of linguistic formalisms. Such difference learnt via self-supervision plays a crucial role in the zero-shot transfer performance and can be predicted by variation in morphosyntactic properties between languages. These results suggest that mBERT properly encodes languages in a way consistent with linguistic diversity and provide insights into the mechanism of cross-lingual transfer.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源