LAMP: data provenance for graph based machine learning algorithms through derivative computation

Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen Chuan Lee, Juan Zhai, Yingqi Liu, Xiangyu Zhang

Research output: Contribution to conferencePaperpeer-review

14 Scopus citations

Abstract

Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering.

Original languageEnglish (US)
Pages786-797
Number of pages12
DOIs
StatePublished - 2017
Externally publishedYes
Event11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2017 - Paderborn, Germany
Duration: Sep 4 2017Sep 8 2017

Other

Other11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2017
Country/TerritoryGermany
CityPaderborn
Period9/4/179/8/17

All Science Journal Classification (ASJC) codes

  • Software

Keywords

  • Data Provenance
  • Debugging
  • Machine Learning

Fingerprint

Dive into the research topics of 'LAMP: data provenance for graph based machine learning algorithms through derivative computation'. Together they form a unique fingerprint.

Cite this