SAGA BigJob: An extensible and interoperable Pilot-Job abstraction for distributed applications and systems

André Luckow, Lukasz Lacinski, Shantenu Jha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

49 Scopus citations

Abstract

The uptake of distributed infrastructures by scientific applications has been limited by the availability of extensible, pervasive and simple-to-use abstractions which are required at multiple levels - development, deployment and execution stages of scientific applications. The Pilot-Job abstraction has been shown to be an effective abstraction to address many requirements of scientific applications. Specifically, Pilot-Jobs support the decoupling of workload submission from resource assignment; this results in a flexible execution model, which in turn enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. Most Pilot-Job implementations however, are tied to a specific infrastructure. In this paper, we describe the design and implementation of a SAGA-based Pilot-Job, which supports a wide range of application types, and is usable over a broad range of infrastructures, i.e., it is general-purpose and extensible, and as we will argue is also interoperable with Clouds. We discuss how the SAGA-based Pilot-Job is used for different application types and supports the concurrent usage across multiple heterogeneous distributed infrastructure, including concurrent usage across Clouds and traditional Grids/Clusters. Further, we show how Pilot-Jobs can help to support dynamic execution models and thus, introduce new opportunities for distributed applications. We also demonstrate for the first time that we are aware of, the use of multiple Pilot-Job implementations to solve the same problem; specifically, we use the SAGA-based Pilot-Job on high-end resources such as the TeraGrid and the native Condor Pilot-Job (Glide-in) on Condor resources. Importantly both are invoked via the same interface without changes at the development or deployment level, but only an execution (run-time) decision.

Original languageEnglish (US)
Title of host publicationCCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing
Pages135-144
Number of pages10
DOIs
StatePublished - 2010
Externally publishedYes
Event10th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2010 - Melbourne, VIC, Australia
Duration: May 17 2010May 20 2010

Publication series

NameCCGrid 2010 - 10th IEEE/ACM International Conference on Cluster, Cloud, and Grid Computing

Other

Other10th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2010
Country/TerritoryAustralia
CityMelbourne, VIC
Period5/17/105/20/10

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'SAGA BigJob: An extensible and interoperable Pilot-Job abstraction for distributed applications and systems'. Together they form a unique fingerprint.

Cite this