论文标题

通过进行卡方检验和超参数优化的不同机器学习分类器进行造血干细胞移植的儿童的生存预测:回顾性分析

Survival Prediction of Children Undergoing Hematopoietic Stem Cell Transplantation Using Different Machine Learning Classifiers by Performing Chi-squared Test and Hyper-parameter Optimization: A Retrospective Analysis

论文作者

Ratul, Ishrak Jahan, Wani, Ummay Habiba, Nishat, Mirza Muntasir, Al-Monsur, Abdullah, Ar-Rafi, Abrar Mohammad, Faisal, Fahim, Kabir, Mohammad Ridwan

论文摘要

骨髓移植是一种有效的手术治疗方法,是一种从骨髓发出的多种疾病的渐进救援。几种危险因素,例如移植后疾病,新的恶性肿瘤甚至器官损害,可能会损害长期生存。因此,将部署像机器学习这样的技术来研究BMT接收器的生存预测以及限制其弹性的影响。在这项研究中,以全面的方式介绍了有效的生存分类模型,并结合了卡方特征选择方法,以解决维度问题和超级参数优化(HPO)以提高准确性。合成数据集的生成是通过归纳缺失值,使用虚拟变量编码转换数据,并使用CHI -Squared功能选择将数据集从59个功能压缩到11个最相关的功能。将数据集分为火车,并以80:20的比率分为火车,并使用网格搜索交叉验证对超参数进行了优化。在这方面对几种有监督的ML方法进行了培训,例如决策树,随机森林,逻辑回归,K-Nearest邻居,梯度增强分类器,ADA BOOST和XG BOOST。通过使用原始和简化的合成数据集,已经对默认和优化的超参数进行了模拟。在使用卡方测试对功能进行排名后,观察到HPO的前11个功能具有与默认参数的整个数据集相同的预测准确性(94.73%)。此外,这种方法需要更少的时间和资源来预测经历了BMT的儿童的生存能力。因此,提出的方法可以通过使用医学数据记录来帮助以令人满意的精度和最小的计算时间来开发计算机辅助诊断系统。

Bone Marrow Transplant, a gradational rescue for a wide range of disorders emanating from the bone marrow, is an efficacious surgical treatment. Several risk factors, such as post-transplant illnesses, new malignancies, and even organ damage, can impair long-term survival. Therefore, technologies like Machine Learning are deployed for investigating the survival prediction of BMT receivers along with the influences that limit their resilience. In this study, an efficient survival classification model is presented in a comprehensive manner, incorporating the Chi-squared feature selection method to address the dimensionality problem and Hyper Parameter Optimization (HPO) to increase accuracy. A synthetic dataset is generated by imputing the missing values, transforming the data using dummy variable encoding, and compressing the dataset from 59 features to the 11 most correlated features using Chi-squared feature selection. The dataset was split into train and test sets at a ratio of 80:20, and the hyperparameters were optimized using Grid Search Cross-Validation. Several supervised ML methods were trained in this regard, like Decision Tree, Random Forest, Logistic Regression, K-Nearest Neighbors, Gradient Boosting Classifier, Ada Boost, and XG Boost. The simulations have been performed for both the default and optimized hyperparameters by using the original and reduced synthetic dataset. After ranking the features using the Chi-squared test, it was observed that the top 11 features with HPO, resulted in the same accuracy of prediction (94.73%) as the entire dataset with default parameters. Moreover, this approach requires less time and resources for predicting the survivability of children undergoing BMT. Hence, the proposed approach may aid in the development of a computer-aided diagnostic system with satisfactory accuracy and minimal computation time by utilizing medical data records.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源