Towards a smart, internet-scale cache service for data intensive scientific applications

Yubo Qin, Anthony Simonet, Philip E. Davis, Azita Nouri, Zhe Wang, Manish Parashar, Ivan Rodero

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Data and services provided by shared facilities, such as large-scale observing facilities, have become important enablers of scientific insights and discoveries across many science and engineering disciplines. Ensuring satisfactory quality of service can be challenging for facilities, due to their remote locations and to the distributed nature of the instruments, observatories, and users, as well as the rapid growth of data volumes and rates. This research explores how knowledge of the facilities usage patterns, coupled with emerging cyberinfrastructures can be leveraged to improve their performance, usability, and scientific impact. We propose a framework with a smart, internet-scale cache augmented with prefetching and data placement strategies to improve data delivery performance for scientific facilities. Our evaluations, which are based on the NSF Ocean Observatories Initiative, demonstrate that our framework is able to predict user requests and reduce data movements by more than 56% across networks.

Original languageEnglish (US)
Title of host publicationScienceCloud 2019 - Proceedings of the 10th Workshop on Scientific Cloud Computing, co-located with HPDC 2019
PublisherAssociation for Computing Machinery, Inc
Pages11-18
Number of pages8
ISBN (Electronic)9781450367585
DOIs
StatePublished - Jun 17 2019
Event10th Workshop on Scientific Cloud Computing, ScienceCloud 2019, co-located with HPDC 2019 - Phoenix, United States
Duration: Jun 25 2019 → …

Publication series

NameScienceCloud 2019 - Proceedings of the 10th Workshop on Scientific Cloud Computing, co-located with HPDC 2019

Conference

Conference10th Workshop on Scientific Cloud Computing, ScienceCloud 2019, co-located with HPDC 2019
Country/TerritoryUnited States
CityPhoenix
Period6/25/19 → …

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Keywords

  • Cyberinfrastructure
  • Data repository
  • Distributed data sharing
  • Distributed facilities
  • Prefetching
  • Virtual data collaboratory

Fingerprint

Dive into the research topics of 'Towards a smart, internet-scale cache service for data intensive scientific applications'. Together they form a unique fingerprint.

Cite this