TY - GEN
T1 - Towards a smart, internet-scale cache service for data intensive scientific applications
AU - Qin, Yubo
AU - Simonet, Anthony
AU - Davis, Philip E.
AU - Nouri, Azita
AU - Wang, Zhe
AU - Parashar, Manish
AU - Rodero, Ivan
PY - 2019/6/17
Y1 - 2019/6/17
N2 - Data and services provided by shared facilities, such as large-scale observing facilities, have become important enablers of scientific insights and discoveries across many science and engineering disciplines. Ensuring satisfactory quality of service can be challenging for facilities, due to their remote locations and to the distributed nature of the instruments, observatories, and users, as well as the rapid growth of data volumes and rates. This research explores how knowledge of the facilities usage patterns, coupled with emerging cyberinfrastructures can be leveraged to improve their performance, usability, and scientific impact. We propose a framework with a smart, internet-scale cache augmented with prefetching and data placement strategies to improve data delivery performance for scientific facilities. Our evaluations, which are based on the NSF Ocean Observatories Initiative, demonstrate that our framework is able to predict user requests and reduce data movements by more than 56% across networks.
AB - Data and services provided by shared facilities, such as large-scale observing facilities, have become important enablers of scientific insights and discoveries across many science and engineering disciplines. Ensuring satisfactory quality of service can be challenging for facilities, due to their remote locations and to the distributed nature of the instruments, observatories, and users, as well as the rapid growth of data volumes and rates. This research explores how knowledge of the facilities usage patterns, coupled with emerging cyberinfrastructures can be leveraged to improve their performance, usability, and scientific impact. We propose a framework with a smart, internet-scale cache augmented with prefetching and data placement strategies to improve data delivery performance for scientific facilities. Our evaluations, which are based on the NSF Ocean Observatories Initiative, demonstrate that our framework is able to predict user requests and reduce data movements by more than 56% across networks.
KW - Cyberinfrastructure
KW - Data repository
KW - Distributed data sharing
KW - Distributed facilities
KW - Prefetching
KW - Virtual data collaboratory
UR - http://www.scopus.com/inward/record.url?scp=85069231116&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85069231116&partnerID=8YFLogxK
U2 - 10.1145/3322795.3331464
DO - 10.1145/3322795.3331464
M3 - Conference contribution
T3 - ScienceCloud 2019 - Proceedings of the 10th Workshop on Scientific Cloud Computing, co-located with HPDC 2019
SP - 11
EP - 18
BT - ScienceCloud 2019 - Proceedings of the 10th Workshop on Scientific Cloud Computing, co-located with HPDC 2019
PB - Association for Computing Machinery, Inc
T2 - 10th Workshop on Scientific Cloud Computing, ScienceCloud 2019, co-located with HPDC 2019
Y2 - 25 June 2019
ER -