A Comprehensive perspective on pilot-job systems

Matteo Turilli, Mark Santcroos, Shantenu Jha

Research output: Contribution to journalReview article

14 Scopus citations

Abstract

Pilot-Job systems play an important role in supporting distributed scientific computing. They are used to execute millions of jobs on several cyberinfrastructures worldwide, consuming billions of CPU hours a year. With the increasing importance of task-level parallelism in high-performance computing, Pilot-Job systems are also witnessing an adoption beyond traditional domains. Notwithstanding the growing impact on scientific research, there is no agreement on a definition of Pilot-Job system and no clear understanding of the underlying abstraction and paradigm. Pilot-Job implementations have proliferated with no shared best practices or open interfaces and little interoperability. Ultimately, this is hindering the realization of the full impact of Pilot-Jobs by limiting their robustness, portability, and maintainability. This article offers a comprehensive analysis of Pilot-Job systems critically assessing their motivations, evolution, properties, and implementation. The three main contributions of this article are as follows: (1) an analysis of the motivations and evolution of Pilot-Job systems; (2) an outline of the Pilot abstraction, its distinguishing logical components and functionalities, its terminology, and its architecture pattern; and (3) the description of core and auxiliary properties of Pilot-Jobs systems and the analysis of six exemplar Pilot-Job implementations. Together, these contributions illustrate the Pilot paradigm, its generality, and how it helps to address some challenges in distributed scientific computing. c 2018 ACM.

Original languageEnglish (US)
Article numbera43
JournalACM Computing Surveys
Volume51
Issue number2
DOIs
StatePublished - Apr 2018
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Keywords

  • Distributed applications
  • Distributed systems
  • Pilot-jobs

Fingerprint Dive into the research topics of 'A Comprehensive perspective on pilot-job systems'. Together they form a unique fingerprint.

  • Cite this