论文标题
使用机器学习和全球智能手机录音
COVID-19 Cough Classification using Machine Learning and Global Smartphone Recordings
论文作者
论文摘要
我们提出了基于机器学习的Covid-19咳嗽分类器,该分类器可以区分智能手机上记录的Covid-19阴性和健康咳嗽。这种类型的筛查是非接触,易于应用的,可以通过建议早期自我隔离来减少测试中心的工作量,并限制传播,以使那些咳嗽的人提前自我隔离。本研究中使用的数据集包括来自所有六大大洲的受试者,并包含强迫和天然咳嗽,表明该方法广泛适用。公开可用的COSWARA数据集包含92个Covid-19-19-阳性和1079名健康受试者,而第二个较小的数据集则主要在南非收集,并包含Covid-19-19-19和26 Covid-19-19 Covid-19,他们接受了SARS-COV实验室测试。这两个数据集都表明Covid-19阳性咳嗽比非卵巢咳嗽短15 \%-20 \%。通过应用合成少数族裔过采样技术(SMOTE)来解决数据集偏斜。保留 - $ p $ - OUT交叉验证方案用于训练和评估七个机器学习分类器:LR,KNN,SVM,MLP,MLP,CNN,LSTM和RESNET50。我们的结果表明,尽管所有分类器都能够识别COVID-19-19,但最佳性能是由RESNET50分类器表现出来的,该分类器最好能够区分COVID-19-19S阳性和健康咳嗽,而ROC曲线(AUC)下的健康咳嗽为0.98。 LSTM分类器最好能够区分COVID-19阳性和COVID-19负咳嗽,在从顺序的正向选择(SFS)中选择最佳13个功能后,AUC为0.94。由于这种类型的咳嗽音频分类是具有成本效益且易于部署的,因此它可能是一种有用且可行的非接触式COVID-19筛选手段。
We present a machine learning based COVID-19 cough classifier which can discriminate COVID-19 positive coughs from both COVID-19 negative and healthy coughs recorded on a smartphone. This type of screening is non-contact, easy to apply, and can reduce the workload in testing centres as well as limit transmission by recommending early self-isolation to those who have a cough suggestive of COVID-19. The datasets used in this study include subjects from all six continents and contain both forced and natural coughs, indicating that the approach is widely applicable. The publicly available Coswara dataset contains 92 COVID-19 positive and 1079 healthy subjects, while the second smaller dataset was collected mostly in South Africa and contains 18 COVID-19 positive and 26 COVID-19 negative subjects who have undergone a SARS-CoV laboratory test. Both datasets indicate that COVID-19 positive coughs are 15\%-20\% shorter than non-COVID coughs. Dataset skew was addressed by applying the synthetic minority oversampling technique (SMOTE). A leave-$p$-out cross-validation scheme was used to train and evaluate seven machine learning classifiers: LR, KNN, SVM, MLP, CNN, LSTM and Resnet50. Our results show that although all classifiers were able to identify COVID-19 coughs, the best performance was exhibited by the Resnet50 classifier, which was best able to discriminate between the COVID-19 positive and the healthy coughs with an area under the ROC curve (AUC) of 0.98. An LSTM classifier was best able to discriminate between the COVID-19 positive and COVID-19 negative coughs, with an AUC of 0.94 after selecting the best 13 features from a sequential forward selection (SFS). Since this type of cough audio classification is cost-effective and easy to deploy, it is potentially a useful and viable means of non-contact COVID-19 screening.