RI: Small: Enabling Interpretable AI via Bayesian Deep Learning

Project Details

Description

Interpretability is one of the fundamental obstacles on the adoption and deployment of deep-learning-based AI systems across various fields such as healthcare, e-commerce, transportation, earth science, and manufacturing. An ideal interpretable model should be able to interpret its prediction using human-understandable concepts (e.g., 'color' and 'shape'), conform to conditional dependencies in the real world (e.g., whether a customer's purchase is due to a discount), and handle uncertainty in data (e.g., how certain the model is about the rainfall tomorrow). Unfortunately, deep learning as a connectionist approach does not natively support these desiderata. The goal of this project is to develop a general interpreter framework for deep learning models. Interpreters under this framework can be plugged into a deep learning model and interpret its predictions using a graph of human-understandable concepts, without sacrificing the model's performance. Methods developed in this project will be applied in health monitoring to interpret models' reasoning on patient status, and in recommender systems to interpret models' recommended items for users.

This project will develop two sets of methods based on Bayesian deep learning: (1) 'Bayesian deep interpreters' that interpret deep learning models with graphical models describing the conditional dependencies leading to current predictions. (2) 'Bayesian deep controllers' that control deep learning models' predictions by manipulating specific random variables in the graphical models attached to the controlled models. Development of such novel methods will build intellectual and formal connection between deep learning and probabilistic graphical models, two major machine learning paradigms that have long been seen as incompatible. It will advance the state of the art on machine learning and AI by: (1) formulating a new Bayesian deep learning framework to unify deep learning and graphical models, the synergy of which will significantly improve deep learning interpretability, (2) under such a principled framework, designing concrete methods that are plug-and-play and therefore do not sacrifice the deep learning models' performance (e.g., accuracy), (3) investigating what theoretical guarantees the developed methods provide and therefore laying foundations for future work by the team and the community, (4) analyzing the trade-off between accuracy, interpretability, and controllability and providing design guidance for interpretable AI systems.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

StatusFinished
Effective start/end date10/1/219/30/24

Funding

  • National Science Foundation: $499,926.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.