论文标题
NLP如何帮助振兴濒危语言?切诺基语言的案例研究和路线图
How can NLP Help Revitalize Endangered Languages? A Case Study and Roadmap for the Cherokee Language
论文作者
论文摘要
世界上有43%以上的语言是濒临灭绝的,目前由于全球化和新殖民主义而以加速速度发生语言丧失。储蓄和振兴的濒危语言对于维持我们星球上的文化多样性变得非常重要。在这项工作中,我们专注于讨论NLP如何帮助振兴濒危语言。我们首先提出了三个可能有助于NLP从业人员建立与语言社区合作的原则,我们讨论了NLP可以帮助语言教育的三种方式。然后,我们以案例研究为案例研究,以严重偏见的美国原住民语言为例。在审查了该语言的历史,语言特征和现有资源之后,我们(与切诺基社区成员合作)为NLP从业者可以与社区合作伙伴合作提供了一些有意义的方式。我们建议通过机器中的机器处理丰富切诺基语言的资源的两种方法,并讨论了切诺基社区的人们表现出兴趣的几种NLP工具。我们希望我们的工作不仅可以为NLP社区提供有关切诺基的信息,还可以为一般的濒危语言提供启发性。我们的代码和数据将通过https://github.com/zhangshiyue/revitalizecherokee开源
More than 43% of the languages spoken in the world are endangered, and language loss currently occurs at an accelerated rate because of globalization and neocolonialism. Saving and revitalizing endangered languages has become very important for maintaining the cultural diversity on our planet. In this work, we focus on discussing how NLP can help revitalize endangered languages. We first suggest three principles that may help NLP practitioners to foster mutual understanding and collaboration with language communities, and we discuss three ways in which NLP can potentially assist in language education. We then take Cherokee, a severely-endangered Native American language, as a case study. After reviewing the language's history, linguistic features, and existing resources, we (in collaboration with Cherokee community members) arrive at a few meaningful ways NLP practitioners can collaborate with community partners. We suggest two approaches to enrich the Cherokee language's resources with machine-in-the-loop processing, and discuss several NLP tools that people from the Cherokee community have shown interest in. We hope that our work serves not only to inform the NLP community about Cherokee, but also to provide inspiration for future work on endangered languages in general. Our code and data will be open-sourced at https://github.com/ZhangShiyue/RevitalizeCherokee