从众包半结构化语音记录中对自闭症进行分类：一种机器学习方法

论文标题

从众包半结构化语音记录中对自闭症进行分类：一种机器学习方法

Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning Approach

论文作者

Chi, Nathan A., Washington, Peter, Kline, Aaron, Husic, Arman, Hou, Cathy, He, Chloe, Dunlap, Kaitlyn, Wall, Dennis

论文摘要

自闭症谱系障碍（ASD）是一种神经发育障碍，导致行为，社会发展和沟通模式改变。在过去的几年中，自闭症患病率增加了两倍，目前有54名儿童中有1人受到影响。鉴于传统诊断是一个漫长的劳动密集型过程，因此对开发自动筛查自闭症的系统非常关注。韵律异常是自闭症的最明显迹象之一，受影响的儿童表现出言语特质，包括Echolalia，单调语调，非典型音调和不规则的语言压力模式。在这项工作中，我们提出了一套机器学习方法，以检测自闭症和神经型（NT）儿童在家庭环境中捕获的自闭症音频中的自闭症。我们考虑了三种在儿童语音中检测自闭症的方法：首先，在提取的音频特征（包括梅尔频率的sepstral系数）中训练的随机森林；其次，在频谱图中训练的卷积神经网络（CNN）；第三，微型WAV2VEC 2.0-基于最先进的变压器ASR模型。我们在斯坦福大学的猜测中策划的手机录制的儿童语音音频的小说数据集训练分类器？手机游戏是一款旨在在自然家庭环境中自闭症和神经型儿童的众包视频的应用。随机森林分类器的精度达到70％，微调WAV2VEC 2.0模型可实现77％的精度，而在将儿童的音频分类为ASD或NT时，CNN的精度达到了79％的精度。我们的模型能够预测自闭症状态，当时在各种选择的记录质量的培训中，可以选择对现实世界条件的质量不一致。这些结果表明，机器学习方法在没有专门设备的情况下从语音自动检测自闭症时提供了希望。

Autism spectrum disorder (ASD) is a neurodevelopmental disorder which results in altered behavior, social development, and communication patterns. In past years, autism prevalence has tripled, with 1 in 54 children now affected. Given that traditional diagnosis is a lengthy, labor-intensive process, significant attention has been given to developing systems that automatically screen for autism. Prosody abnormalities are among the clearest signs of autism, with affected children displaying speech idiosyncrasies including echolalia, monotonous intonation, atypical pitch, and irregular linguistic stress patterns. In this work, we present a suite of machine learning approaches to detect autism in self-recorded speech audio captured from autistic and neurotypical (NT) children in home environments. We consider three methods to detect autism in child speech: first, Random Forests trained on extracted audio features (including Mel-frequency cepstral coefficients); second, convolutional neural networks (CNNs) trained on spectrograms; and third, fine-tuned wav2vec 2.0--a state-of-the-art Transformer-based ASR model. We train our classifiers on our novel dataset of cellphone-recorded child speech audio curated from Stanford's Guess What? mobile game, an app designed to crowdsource videos of autistic and neurotypical children in a natural home environment. The Random Forest classifier achieves 70% accuracy, the fine-tuned wav2vec 2.0 model achieves 77% accuracy, and the CNN achieves 79% accuracy when classifying children's audio as either ASD or NT. Our models were able to predict autism status when training on a varied selection of home audio clips with inconsistent recording quality, which may be more generalizable to real world conditions. These results demonstrate that machine learning methods offer promise in detecting autism automatically from speech without specialized equipment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题