TY - GEN
T1 - Global and local interpretation of black-box Machine Learning models to determine prognostic factors from early COVID-19 data
AU - Jana, Ananya
AU - Minacapelli, Carlos D.
AU - Rustgi, Vinod
AU - Metaxas, Dimitris
N1 - Publisher Copyright:
© 2021 SPIE.
PY - 2021
Y1 - 2021
N2 - The COVID-19 corona virus has claimed 4.1 million lives, as of July 24, 2021. A variety of machine learning models have been applied to related data to predict important factors such as the severity of the disease, infection rate and discover important prognostic factors. Often the usefulness of the findings from the use of these techniques is reduced due to lack of method interpretability. Some recent progress made on the interpretability of machine learning models has the potential to unravel more insights while using conventional machine learning models.1-3 In this work, we analyze COVID-19 blood work data with some of the popular machine learning models; then we employ state-of-the-art post-hoc local interpretability techniques(e.g.- SHAP, LIME), and global interpretability techniques(e.g. - symbolic metamodeling) to the trained black-box models to draw interpretable conclusions. In the gamut of machine learning algorithms, regressions remain one of the simplest and most explainable models with clear mathematical formulation. We explore one of the most recent techniques called symbolic metamodeling to find the mathematical expression of the machine learning models for COVID-19. We identify Acute Kidney Injury (AKI), initial Albumin level (ALB I), Aspartate aminotransferase (AST I), Total Bilirubin initial (TBILI) and D-Dimer initial (DIMER) as major prognostic factors of the disease severity. Our contributions are - (i) uncover the underlying mathematical expression for the black-box models on COVID-19 severity prediction task (ii) we are the first to apply symbolic metamodeling to this task, and (iii) discover important features and feature interactions. Code repository: https://github.com/ananyajana/interpretable covid19.
AB - The COVID-19 corona virus has claimed 4.1 million lives, as of July 24, 2021. A variety of machine learning models have been applied to related data to predict important factors such as the severity of the disease, infection rate and discover important prognostic factors. Often the usefulness of the findings from the use of these techniques is reduced due to lack of method interpretability. Some recent progress made on the interpretability of machine learning models has the potential to unravel more insights while using conventional machine learning models.1-3 In this work, we analyze COVID-19 blood work data with some of the popular machine learning models; then we employ state-of-the-art post-hoc local interpretability techniques(e.g.- SHAP, LIME), and global interpretability techniques(e.g. - symbolic metamodeling) to the trained black-box models to draw interpretable conclusions. In the gamut of machine learning algorithms, regressions remain one of the simplest and most explainable models with clear mathematical formulation. We explore one of the most recent techniques called symbolic metamodeling to find the mathematical expression of the machine learning models for COVID-19. We identify Acute Kidney Injury (AKI), initial Albumin level (ALB I), Aspartate aminotransferase (AST I), Total Bilirubin initial (TBILI) and D-Dimer initial (DIMER) as major prognostic factors of the disease severity. Our contributions are - (i) uncover the underlying mathematical expression for the black-box models on COVID-19 severity prediction task (ii) we are the first to apply symbolic metamodeling to this task, and (iii) discover important features and feature interactions. Code repository: https://github.com/ananyajana/interpretable covid19.
KW - COVID-19
KW - Interpretability
KW - Machine learning
KW - Mathematical expression
KW - Symbolic metamodel
UR - https://www.scopus.com/pages/publications/85123055425
UR - https://www.scopus.com/pages/publications/85123055425#tab=citedBy
U2 - 10.1117/12.2604743
DO - 10.1117/12.2604743
M3 - Conference contribution
AN - SCOPUS:85123055425
T3 - Proceedings of SPIE - The International Society for Optical Engineering
BT - 17th International Symposium on Medical Information Processing and Analysis
A2 - Romero, Eduardo
A2 - Costa, Eduardo Tavares
A2 - Brieva, Jorge
A2 - Rittner, Leticia
A2 - Linguraru, Marius George
A2 - Lepore, Natasha
PB - SPIE
T2 - 17th International Symposium on Medical Information Processing and Analysis
Y2 - 17 November 2021 through 19 November 2021
ER -