Global and local interpretation of black-box Machine Learning models to determine prognostic factors from early COVID-19 data

Ananya Jana, Carlos D. Minacapelli, Vinod Rustgi, Dimitris Metaxas

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The COVID-19 corona virus has claimed 4.1 million lives, as of July 24, 2021. A variety of machine learning models have been applied to related data to predict important factors such as the severity of the disease, infection rate and discover important prognostic factors. Often the usefulness of the findings from the use of these techniques is reduced due to lack of method interpretability. Some recent progress made on the interpretability of machine learning models has the potential to unravel more insights while using conventional machine learning models.1-3 In this work, we analyze COVID-19 blood work data with some of the popular machine learning models; then we employ state-of-the-art post-hoc local interpretability techniques(e.g.- SHAP, LIME), and global interpretability techniques(e.g. - symbolic metamodeling) to the trained black-box models to draw interpretable conclusions. In the gamut of machine learning algorithms, regressions remain one of the simplest and most explainable models with clear mathematical formulation. We explore one of the most recent techniques called symbolic metamodeling to find the mathematical expression of the machine learning models for COVID-19. We identify Acute Kidney Injury (AKI), initial Albumin level (ALB I), Aspartate aminotransferase (AST I), Total Bilirubin initial (TBILI) and D-Dimer initial (DIMER) as major prognostic factors of the disease severity. Our contributions are - (i) uncover the underlying mathematical expression for the black-box models on COVID-19 severity prediction task (ii) we are the first to apply symbolic metamodeling to this task, and (iii) discover important features and feature interactions. Code repository: https://github.com/ananyajana/interpretable covid19.

Original languageEnglish (US)
Title of host publication17th International Symposium on Medical Information Processing and Analysis
EditorsEduardo Romero, Eduardo Tavares Costa, Jorge Brieva, Leticia Rittner, Marius George Linguraru, Natasha Lepore
PublisherSPIE
ISBN (Electronic)9781510650527
DOIs
StatePublished - 2021
Event17th International Symposium on Medical Information Processing and Analysis - Campinas, Brazil
Duration: Nov 17 2021Nov 19 2021

Publication series

NameProceedings of SPIE - The International Society for Optical Engineering
Volume12088
ISSN (Print)0277-786X
ISSN (Electronic)1996-756X

Conference

Conference17th International Symposium on Medical Information Processing and Analysis
Country/TerritoryBrazil
CityCampinas
Period11/17/2111/19/21

All Science Journal Classification (ASJC) codes

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Keywords

  • COVID-19
  • Interpretability
  • Machine learning
  • Mathematical expression
  • Symbolic metamodel

Fingerprint

Dive into the research topics of 'Global and local interpretation of black-box Machine Learning models to determine prognostic factors from early COVID-19 data'. Together they form a unique fingerprint.

Cite this