论文标题
过去的字母:通过透明性嵌入建模历史声音变化
Letters From the Past: Modeling Historical Sound Change Through Diachronic Character Embeddings
论文作者
论文摘要
尽管在NLP的词汇语义变化检测方面已经做了大量工作,但语言变化的其他方面受到NLP社区的关注较少。在本文中,我们通过历史拼写解决了声音变化的检测。我们建议,可以使用PPMI字符嵌入来比较其分布之间的相对距离来捕获声音变化。我们在综合数据中验证了这一假设,然后测试该方法在丹麦历史来源中追踪普洛氏菌的众所周知的历史变化的能力。我们表明,这些模型能够识别正在考虑的几个更改,并发现它们出现的有意义的上下文。该方法有可能有助于研究开放问题,例如声音转移的相对年代及其地理分布。
While a great deal of work has been done on NLP approaches to lexical semantic change detection, other aspects of language change have received less attention from the NLP community. In this paper, we address the detection of sound change through historical spelling. We propose that a sound change can be captured by comparing the relative distance through time between their distributions using PPMI character embeddings. We verify this hypothesis in synthetic data and then test the method's ability to trace the well-known historical change of lenition of plosives in Danish historical sources. We show that the models are able to identify several of the changes under consideration and to uncover meaningful contexts in which they appeared. The methodology has the potential to contribute to the study of open questions such as the relative chronology of sound shifts and their geographical distribution.