Combining resampling strategies and ensemble machine learning methods to enhance prediction of neonates with a low apgar score after induction of labor in Northern Tanzania

Clifford Silver Tarimo, Soumitra S. Bhuyan, Quanman Li, Weicun Ren, Michael Johnson Mahande, Jian Wu

Research output: Contribution to journalArticlepeer-review

Abstract

Objective: The goal of this study was to establish the most efficient boosting method in predicting neonatal low Apgar scores following labor induction intervention and to assess whether resampling strategies would improve the predictive performance of the selected boosting algorithms. Methods: A total of 7716 singleton births delivered from 2000 to 2015 were analyzed. Cesarean deliveries following labor induction, deliveries with abnormal presentation, and deliveries with missing Apgar score or delivery mode information were excluded. We examined the effect of resampling approaches or data preprocessing on predicting low Apgar scores, specifically the synthetic minority oversampling technique (SMOTE), borderline-SMOTE, and the random undersampling (RUS) technique. Sensitivity, specificity, precision, area under receiver operating curve (AUROC), F-score, positive predicted values (PPV), negative predicted values (NPV) and accuracy of the three (3) boosting-based ensemble methods were used to evaluate their discriminative ability. The ensemble learning models tested include adoptive boosting (AdaBoost), gradient boosting (GB) and extreme gradient boosting method (XGBoost). Results: The prevalence of low (<7) Apgar scores was 9.5% (n = 733). The prediction models performed nearly similar in their baseline mode. Following the application of resampling techniques, borderline-SMOTE significantly improved the predictive performance of all the boosting-based ensemble methods under observation in terms of sensitivity, F1-score, AUROC and PPV. Conclusion: Policymakers, healthcare informaticians and neonatologists should consider imple-menting data preprocessing strategies when predicting a neonatal outcome with imbalanced data to enhance efficiency. The process may be more effective when borderline-SMOTE technique is deployed on the selected ensemble classifiers. However, future research may focus on testing additional resampling techniques, performing feature engineering, variable selection and optimiz-ing further the ensemble learning hyperparameters.

Original languageEnglish (US)
Pages (from-to)3711-3720
Number of pages10
JournalRisk Management and Healthcare Policy
Volume14
DOIs
StatePublished - 2021
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Health Policy
  • Public Health, Environmental and Occupational Health

Keywords

  • Ensemble learning
  • Imbalanced data
  • Labor induction
  • Low apgar score
  • Machine learning
  • Resampling methods

Fingerprint

Dive into the research topics of 'Combining resampling strategies and ensemble machine learning methods to enhance prediction of neonates with a low apgar score after induction of labor in Northern Tanzania'. Together they form a unique fingerprint.

Cite this