论文标题

功能丰富的多重词汇网络揭示了早期语言学习的心理策略

Feature-rich multiplex lexical networks reveal mental strategies of early language learning

论文作者

Citraro, Salvatore, Vitevitch, Michael S., Stella, Massimo, Rossetti, Giulio

论文摘要

人类思想中的知识表现出二元矢量/网络性质。将单词建模为向量是自然语言处理的关键,而单词关联网络可以映射语义记忆的性质。我们通过引入功能丰富的多重词汇(Fermulex)网络来调和这些范式 - 跨语言学,心理学和计算机科学片段。这个新颖的框架合并了网络和单词的向量特征中的结构相似性,可以独立地组合或探索。相似性模拟知识的语义/句法/语音方面跨语义/句法/语音方面的异质单词关联。单词充满了多维特征嵌入,包括频率,习得年龄,长度和多义。这些方面可以对认知知识进行前所未有的探索。通过Childes数据,我们使用Fermulex网络在18到30个月之间通过1000个幼儿对规范性语言采集进行建模。相似性和嵌入方式通过一致性捕获单词,这可以通过距离和特征来测量分类混合。顺从发掘了基本句子生产的频繁/多义/短名词和动词密钥的语言内核,支持了30个月时儿童句法结构出现的最新证据。该内核是网络核心检测和仅特征聚类的看不见的:它来自单词的双向量/网络性质。我们的定量分析揭示了早期单词学习中的两个关键策略。我们将单词获取作为Fermulex拓扑的随机步行,我们重点介绍了交流发展库存(CDIS)的非均匀填充。基于合格的步行者会导致准确(75%),精确(55%)和部分重新召集的(34%)预测CDI中的早期单词学习,从而为先前的经验发现和发展理论提供了定量支持。

Knowledge in the human mind exhibits a dualistic vector/network nature. Modelling words as vectors is key to natural language processing, whereas networks of word associations can map the nature of semantic memory. We reconcile these paradigms - fragmented across linguistics, psychology and computer science - by introducing FEature-Rich MUltiplex LEXical (FERMULEX) networks. This novel framework merges structural similarities in networks and vector features of words, which can be combined or explored independently. Similarities model heterogenous word associations across semantic/syntactic/phonological aspects of knowledge. Words are enriched with multi-dimensional feature embeddings including frequency, age of acquisition, length and polysemy. These aspects enable unprecedented explorations of cognitive knowledge. Through CHILDES data, we use FERMULEX networks to model normative language acquisition by 1000 toddlers between 18 and 30 months. Similarities and embeddings capture word homophily via conformity, which measures assortative mixing via distance and features. Conformity unearths a language kernel of frequent/polysemous/short nouns and verbs key for basic sentence production, supporting recent evidence of children's syntactic constructs emerging at 30 months. This kernel is invisible to network core-detection and feature-only clustering: It emerges from the dual vector/network nature of words. Our quantitative analysis reveals two key strategies in early word learning. Modelling word acquisition as random walks on FERMULEX topology, we highlight non-uniform filling of communicative developmental inventories (CDIs). Conformity-based walkers lead to accurate (75%), precise (55%) and partially well-recalled (34%) predictions of early word learning in CDIs, providing quantitative support to previous empirical findings and developmental theories.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源