A Comprehensive perspective on pilot-job systems

Matteo Turilli, Mark Santcroos, Shantenu Jha

Research output: Contribution to journalReview article

6 Citations (Scopus)

Abstract

Pilot-Job systems play an important role in supporting distributed scientific computing. They are used to execute millions of jobs on several cyberinfrastructures worldwide, consuming billions of CPU hours a year. With the increasing importance of task-level parallelism in high-performance computing, Pilot-Job systems are also witnessing an adoption beyond traditional domains. Notwithstanding the growing impact on scientific research, there is no agreement on a definition of Pilot-Job system and no clear understanding of the underlying abstraction and paradigm. Pilot-Job implementations have proliferated with no shared best practices or open interfaces and little interoperability. Ultimately, this is hindering the realization of the full impact of Pilot-Jobs by limiting their robustness, portability, and maintainability. This article offers a comprehensive analysis of Pilot-Job systems critically assessing their motivations, evolution, properties, and implementation. The three main contributions of this article are as follows: (1) an analysis of the motivations and evolution of Pilot-Job systems; (2) an outline of the Pilot abstraction, its distinguishing logical components and functionalities, its terminology, and its architecture pattern; and (3) the description of core and auxiliary properties of Pilot-Jobs systems and the analysis of six exemplar Pilot-Job implementations. Together, these contributions illustrate the Pilot paradigm, its generality, and how it helps to address some challenges in distributed scientific computing. c 2018 ACM.

Original languageEnglish (US)
Article numbera43
JournalACM Computing Surveys
Volume51
Issue number2
DOIs
StatePublished - Apr 2018

Fingerprint

Natural sciences computing
Maintainability
Terminology
Interoperability
Program processors
Scientific Computing
Distributed Computing
Paradigm
Best Practice
Portability
Parallelism
High Performance
Limiting
Robustness
Computing

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Keywords

  • Distributed applications
  • Distributed systems
  • Pilot-jobs

Cite this

Turilli, Matteo ; Santcroos, Mark ; Jha, Shantenu. / A Comprehensive perspective on pilot-job systems. In: ACM Computing Surveys. 2018 ; Vol. 51, No. 2.
@article{8dc5cd9e647f4f17bb0ca4eb15baacbc,
title = "A Comprehensive perspective on pilot-job systems",
abstract = "Pilot-Job systems play an important role in supporting distributed scientific computing. They are used to execute millions of jobs on several cyberinfrastructures worldwide, consuming billions of CPU hours a year. With the increasing importance of task-level parallelism in high-performance computing, Pilot-Job systems are also witnessing an adoption beyond traditional domains. Notwithstanding the growing impact on scientific research, there is no agreement on a definition of Pilot-Job system and no clear understanding of the underlying abstraction and paradigm. Pilot-Job implementations have proliferated with no shared best practices or open interfaces and little interoperability. Ultimately, this is hindering the realization of the full impact of Pilot-Jobs by limiting their robustness, portability, and maintainability. This article offers a comprehensive analysis of Pilot-Job systems critically assessing their motivations, evolution, properties, and implementation. The three main contributions of this article are as follows: (1) an analysis of the motivations and evolution of Pilot-Job systems; (2) an outline of the Pilot abstraction, its distinguishing logical components and functionalities, its terminology, and its architecture pattern; and (3) the description of core and auxiliary properties of Pilot-Jobs systems and the analysis of six exemplar Pilot-Job implementations. Together, these contributions illustrate the Pilot paradigm, its generality, and how it helps to address some challenges in distributed scientific computing. c 2018 ACM.",
keywords = "Distributed applications, Distributed systems, Pilot-jobs",
author = "Matteo Turilli and Mark Santcroos and Shantenu Jha",
year = "2018",
month = "4",
doi = "10.1145/3177851",
language = "English (US)",
volume = "51",
journal = "ACM Computing Surveys",
issn = "0360-0300",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

A Comprehensive perspective on pilot-job systems. / Turilli, Matteo; Santcroos, Mark; Jha, Shantenu.

In: ACM Computing Surveys, Vol. 51, No. 2, a43, 04.2018.

Research output: Contribution to journalReview article

TY - JOUR

T1 - A Comprehensive perspective on pilot-job systems

AU - Turilli, Matteo

AU - Santcroos, Mark

AU - Jha, Shantenu

PY - 2018/4

Y1 - 2018/4

N2 - Pilot-Job systems play an important role in supporting distributed scientific computing. They are used to execute millions of jobs on several cyberinfrastructures worldwide, consuming billions of CPU hours a year. With the increasing importance of task-level parallelism in high-performance computing, Pilot-Job systems are also witnessing an adoption beyond traditional domains. Notwithstanding the growing impact on scientific research, there is no agreement on a definition of Pilot-Job system and no clear understanding of the underlying abstraction and paradigm. Pilot-Job implementations have proliferated with no shared best practices or open interfaces and little interoperability. Ultimately, this is hindering the realization of the full impact of Pilot-Jobs by limiting their robustness, portability, and maintainability. This article offers a comprehensive analysis of Pilot-Job systems critically assessing their motivations, evolution, properties, and implementation. The three main contributions of this article are as follows: (1) an analysis of the motivations and evolution of Pilot-Job systems; (2) an outline of the Pilot abstraction, its distinguishing logical components and functionalities, its terminology, and its architecture pattern; and (3) the description of core and auxiliary properties of Pilot-Jobs systems and the analysis of six exemplar Pilot-Job implementations. Together, these contributions illustrate the Pilot paradigm, its generality, and how it helps to address some challenges in distributed scientific computing. c 2018 ACM.

AB - Pilot-Job systems play an important role in supporting distributed scientific computing. They are used to execute millions of jobs on several cyberinfrastructures worldwide, consuming billions of CPU hours a year. With the increasing importance of task-level parallelism in high-performance computing, Pilot-Job systems are also witnessing an adoption beyond traditional domains. Notwithstanding the growing impact on scientific research, there is no agreement on a definition of Pilot-Job system and no clear understanding of the underlying abstraction and paradigm. Pilot-Job implementations have proliferated with no shared best practices or open interfaces and little interoperability. Ultimately, this is hindering the realization of the full impact of Pilot-Jobs by limiting their robustness, portability, and maintainability. This article offers a comprehensive analysis of Pilot-Job systems critically assessing their motivations, evolution, properties, and implementation. The three main contributions of this article are as follows: (1) an analysis of the motivations and evolution of Pilot-Job systems; (2) an outline of the Pilot abstraction, its distinguishing logical components and functionalities, its terminology, and its architecture pattern; and (3) the description of core and auxiliary properties of Pilot-Jobs systems and the analysis of six exemplar Pilot-Job implementations. Together, these contributions illustrate the Pilot paradigm, its generality, and how it helps to address some challenges in distributed scientific computing. c 2018 ACM.

KW - Distributed applications

KW - Distributed systems

KW - Pilot-jobs

UR - http://www.scopus.com/inward/record.url?scp=85046551837&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046551837&partnerID=8YFLogxK

U2 - 10.1145/3177851

DO - 10.1145/3177851

M3 - Review article

AN - SCOPUS:85046551837

VL - 51

JO - ACM Computing Surveys

JF - ACM Computing Surveys

SN - 0360-0300

IS - 2

M1 - a43

ER -