Computational reproducibility of scientific workflows at extreme scales

Line Pouchard, Sterling Baldwin, Todd Elsethagen, Shantenu Jha, Bibi Raju, Eric Stephan, Li Tang, Kerstin Kleese Van Dam

Research output: Contribution to journalArticle

2 Scopus citations


We propose an approach for improved reproducibility that includes capturing and relating provenance characteristics and performance metrics. We discuss two use cases: scientific reproducibility of results in the Energy Exascale Earth System Model (E3SM—previously ACME) and performance reproducibility in molecular dynamics workflows on HPC platforms. To capture and persist the provenance and performance data of these workflows, we have designed and developed the Chimbuko and ProvEn frameworks. Chimbuko captures provenance and enables detailed single workflow performance analysis. ProvEn is a hybrid, queryable system for storing and analyzing the provenance and performance metrics of multiple runs in workflow performance analysis campaigns. Workflow provenance and performance data output from Chimbuko can be visualized in a dynamic, multilevel visualization providing overview and zoom-in capabilities for areas of interest. Provenance and related performance data ingested into ProvEn is queryable and can be used to reproduce runs. Our provenance-based approach highlights challenges in extracting information and gaps in the information collected. It is agnostic to the type of provenance data it captures so that both the reproducibility of scientific results and that of performance can be explored with our tools.

Original languageEnglish (US)
Pages (from-to)763-776
Number of pages14
JournalInternational Journal of High Performance Computing Applications
Issue number5
StatePublished - Sep 1 2019

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture


  • Chimbuko
  • Computational reproducibility
  • ProvEn
  • performance analysis
  • provenance
  • scientific workflows

Fingerprint Dive into the research topics of 'Computational reproducibility of scientific workflows at extreme scales'. Together they form a unique fingerprint.

  • Cite this