Eager: Online Processing Of Data In Large Facilities Using National Advanced Cyberinfrastructure

Description

Open, large-scale scientific facilities are an essential part of science and engineering enterprise. These facilities provide shared-use infrastructure, instrumentation, and data products that are openly accessible to a broad community of researchers and/or educators. Current facilities provide increasing volumes of data and data products that have the potential to deliver new insights in a wide range of science and engineering domains. However, while these facilities provide reliable and pervasive access to the data and data products, users typically must download the data of interest and process them using local resources. Consequently, transforming these data and data products into insights requires local access to powerful computing, storage, and networking resources. On the other hand, the NSF Advanced Cyberinfrastructure (ACI) is playing an increasingly important role as an open platform for computational and data-enabled science and engineering and can provide the necessary capabilities to allow a broad user community to effectively process the data in large facilities. However, despite clearly complementing each other, large scientific facilities and NSF ACI remain largely disconnected. As a result, users are forced to actively be part of the process that moves data from large facilities to local computational resources or NSF ACI. Therefore, this data-delivery mode becomes inefficient and limits the potential utility that the data would have if processed in an automatic manner. The outcome of this research can have a significant impact on the scientific and engineering community by improving the accessibility of data and the way scientists interact with both data sources and computational infrastructures. Bringing national ACI and large scientific facilities together will democratize access to science and improve the impact of the NSF-funded infrastructure. This is especially important for small public institutions that have limited resources and do not have high bandwidth Internet connection to the Academic/Research network. The development of human resources, including the training of students, researchers and software professionals, as well as the outreach to minorities and underrepresented groups, will be an integral aspect of this effort. The project uses an open repository to disseminate research papers, prototype implementations, and associated data products to the community.The goal of this project is to explore how NSF-funded ACI, such as the Extreme Science and Engineering Discovery Environment (XSEDE), can be integrated with large facilities generally, and the Ocean Observatories Initiative (OOI) specifically, in an automated manner to support end-to-end user workflows. Specifically, we propose to enable workflows that when triggered can seamlessly orchestrate the entire data-to-discovery pipeline. This involves executing queries on the OOI cyberinfrastructure (possibly based on the occurrence of events of interest), streaming data to appropriate ACI facilities using high bandwidth interconnects (such as Internet2) in order to stage this data close to computing/analytics resources (e.g., XSEDE JetStream), and then launching the modeling and analysis processes to transform such data into insights. In this way, the project will leverage high-performance networks that typically connect these facilities to support data movement, and process this data using state-of-the-art high-performance systems.
StatusActive
Effective start/end date9/1/178/31/19

Funding

  • National Science Foundation (NSF)

Fingerprint

Processing
Observatories
Bandwidth
Launching
Network performance
Pipelines
Internet
Personnel
Students