Computational reproducibility of scientific workflows at extreme scales

Line Pouchard, Sterling Baldwin, Todd Elsethagen, Shantenu Jha, Bibi Raju, Eric Stephan, Li Tang, Kerstin Kleese Van Dam

Research output: Contribution to journalArticle

Abstract

We propose an approach for improved reproducibility that includes capturing and relating provenance characteristics and performance metrics. We discuss two use cases: scientific reproducibility of results in the Energy Exascale Earth System Model (E3SM—previously ACME) and performance reproducibility in molecular dynamics workflows on HPC platforms. To capture and persist the provenance and performance data of these workflows, we have designed and developed the Chimbuko and ProvEn frameworks. Chimbuko captures provenance and enables detailed single workflow performance analysis. ProvEn is a hybrid, queryable system for storing and analyzing the provenance and performance metrics of multiple runs in workflow performance analysis campaigns. Workflow provenance and performance data output from Chimbuko can be visualized in a dynamic, multilevel visualization providing overview and zoom-in capabilities for areas of interest. Provenance and related performance data ingested into ProvEn is queryable and can be used to reproduce runs. Our provenance-based approach highlights challenges in extracting information and gaps in the information collected. It is agnostic to the type of provenance data it captures so that both the reproducibility of scientific results and that of performance can be explored with our tools.

Original languageEnglish (US)
Pages (from-to)763-776
Number of pages14
JournalInternational Journal of High Performance Computing Applications
Volume33
Issue number5
DOIs
StatePublished - Sep 1 2019
Externally publishedYes

Fingerprint

Scientific Workflow
Provenance
Reproducibility
Hybrid systems
Molecular dynamics
Data acquisition
Extremes
Visualization
Earth (planet)
Work Flow
Performance Metrics
Performance Analysis
Use Case
Hybrid Systems
Molecular Dynamics
Output

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Keywords

  • Chimbuko
  • Computational reproducibility
  • ProvEn
  • performance analysis
  • provenance
  • scientific workflows

Cite this

Pouchard, Line ; Baldwin, Sterling ; Elsethagen, Todd ; Jha, Shantenu ; Raju, Bibi ; Stephan, Eric ; Tang, Li ; Van Dam, Kerstin Kleese. / Computational reproducibility of scientific workflows at extreme scales. In: International Journal of High Performance Computing Applications. 2019 ; Vol. 33, No. 5. pp. 763-776.
@article{e5be31c08c514a6daca698044c5c554d,
title = "Computational reproducibility of scientific workflows at extreme scales",
abstract = "We propose an approach for improved reproducibility that includes capturing and relating provenance characteristics and performance metrics. We discuss two use cases: scientific reproducibility of results in the Energy Exascale Earth System Model (E3SM—previously ACME) and performance reproducibility in molecular dynamics workflows on HPC platforms. To capture and persist the provenance and performance data of these workflows, we have designed and developed the Chimbuko and ProvEn frameworks. Chimbuko captures provenance and enables detailed single workflow performance analysis. ProvEn is a hybrid, queryable system for storing and analyzing the provenance and performance metrics of multiple runs in workflow performance analysis campaigns. Workflow provenance and performance data output from Chimbuko can be visualized in a dynamic, multilevel visualization providing overview and zoom-in capabilities for areas of interest. Provenance and related performance data ingested into ProvEn is queryable and can be used to reproduce runs. Our provenance-based approach highlights challenges in extracting information and gaps in the information collected. It is agnostic to the type of provenance data it captures so that both the reproducibility of scientific results and that of performance can be explored with our tools.",
keywords = "Chimbuko, Computational reproducibility, ProvEn, performance analysis, provenance, scientific workflows",
author = "Line Pouchard and Sterling Baldwin and Todd Elsethagen and Shantenu Jha and Bibi Raju and Eric Stephan and Li Tang and {Van Dam}, {Kerstin Kleese}",
year = "2019",
month = "9",
day = "1",
doi = "10.1177/1094342019839124",
language = "English (US)",
volume = "33",
pages = "763--776",
journal = "International Journal of High Performance Computing Applications",
issn = "1094-3420",
publisher = "SAGE Publications Inc.",
number = "5",

}

Pouchard, L, Baldwin, S, Elsethagen, T, Jha, S, Raju, B, Stephan, E, Tang, L & Van Dam, KK 2019, 'Computational reproducibility of scientific workflows at extreme scales', International Journal of High Performance Computing Applications, vol. 33, no. 5, pp. 763-776. https://doi.org/10.1177/1094342019839124

Computational reproducibility of scientific workflows at extreme scales. / Pouchard, Line; Baldwin, Sterling; Elsethagen, Todd; Jha, Shantenu; Raju, Bibi; Stephan, Eric; Tang, Li; Van Dam, Kerstin Kleese.

In: International Journal of High Performance Computing Applications, Vol. 33, No. 5, 01.09.2019, p. 763-776.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Computational reproducibility of scientific workflows at extreme scales

AU - Pouchard, Line

AU - Baldwin, Sterling

AU - Elsethagen, Todd

AU - Jha, Shantenu

AU - Raju, Bibi

AU - Stephan, Eric

AU - Tang, Li

AU - Van Dam, Kerstin Kleese

PY - 2019/9/1

Y1 - 2019/9/1

N2 - We propose an approach for improved reproducibility that includes capturing and relating provenance characteristics and performance metrics. We discuss two use cases: scientific reproducibility of results in the Energy Exascale Earth System Model (E3SM—previously ACME) and performance reproducibility in molecular dynamics workflows on HPC platforms. To capture and persist the provenance and performance data of these workflows, we have designed and developed the Chimbuko and ProvEn frameworks. Chimbuko captures provenance and enables detailed single workflow performance analysis. ProvEn is a hybrid, queryable system for storing and analyzing the provenance and performance metrics of multiple runs in workflow performance analysis campaigns. Workflow provenance and performance data output from Chimbuko can be visualized in a dynamic, multilevel visualization providing overview and zoom-in capabilities for areas of interest. Provenance and related performance data ingested into ProvEn is queryable and can be used to reproduce runs. Our provenance-based approach highlights challenges in extracting information and gaps in the information collected. It is agnostic to the type of provenance data it captures so that both the reproducibility of scientific results and that of performance can be explored with our tools.

AB - We propose an approach for improved reproducibility that includes capturing and relating provenance characteristics and performance metrics. We discuss two use cases: scientific reproducibility of results in the Energy Exascale Earth System Model (E3SM—previously ACME) and performance reproducibility in molecular dynamics workflows on HPC platforms. To capture and persist the provenance and performance data of these workflows, we have designed and developed the Chimbuko and ProvEn frameworks. Chimbuko captures provenance and enables detailed single workflow performance analysis. ProvEn is a hybrid, queryable system for storing and analyzing the provenance and performance metrics of multiple runs in workflow performance analysis campaigns. Workflow provenance and performance data output from Chimbuko can be visualized in a dynamic, multilevel visualization providing overview and zoom-in capabilities for areas of interest. Provenance and related performance data ingested into ProvEn is queryable and can be used to reproduce runs. Our provenance-based approach highlights challenges in extracting information and gaps in the information collected. It is agnostic to the type of provenance data it captures so that both the reproducibility of scientific results and that of performance can be explored with our tools.

KW - Chimbuko

KW - Computational reproducibility

KW - ProvEn

KW - performance analysis

KW - provenance

KW - scientific workflows

UR - http://www.scopus.com/inward/record.url?scp=85064205253&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064205253&partnerID=8YFLogxK

U2 - 10.1177/1094342019839124

DO - 10.1177/1094342019839124

M3 - Article

AN - SCOPUS:85064205253

VL - 33

SP - 763

EP - 776

JO - International Journal of High Performance Computing Applications

JF - International Journal of High Performance Computing Applications

SN - 1094-3420

IS - 5

ER -