论文标题

开源语音资源中的性别代表

Gender Representation in Open Source Speech Resources

论文作者

Garnerin, Mahault, Rossato, Solange, Besacier, Laurent

论文摘要

随着人工智能(AI)的兴起以及对深度学习架构的日益增长的使用,AI系统的伦理,透明度和公平性问题已成为研究界的核心问题。我们通过提出有关通过开放语音和语言资源平台提供的语音资源中性别代表性的研究来解决口语系统中的透明度和公平性。我们表明,在开源语料库中找到性别信息并不直接,性别平衡取决于其他语料库特征(引起/非引起语音,低/高资源语言,针对的语音任务)。本文以有关研究人员的元数据和性别信息的建议结尾,以确保使用该语料库构建的语音系统的更好透明度。

With the rise of artificial intelligence (AI) and the growing use of deep-learning architectures, the question of ethics, transparency and fairness of AI systems has become a central concern within the research community. We address transparency and fairness in spoken language systems by proposing a study about gender representation in speech resources available through the Open Speech and Language Resource platform. We show that finding gender information in open source corpora is not straightforward and that gender balance depends on other corpus characteristics (elicited/non elicited speech, low/high resource language, speech task targeted). The paper ends with recommendations about metadata and gender information for researchers in order to assure better transparency of the speech systems built using such corpora.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源