Project Details
Description
Modern scientific research heavily relies on supercomputers. Supercomputing applications, such as traditional numerical simulations (HPC), data intensive applications (Big Data), and most recently, deep learning (DL) applications, are increasingly run on supercomputers to obtain timely results and explore new research methods that combine multiple application types. However, a bottleneck in their design reduces the potential performance of modern supercomputers. This project, bbThemis, addresses this problem by enabling efficient and policy-driven sharing of an intermediate storage layer known as a "burst buffer", so that more scientists and applications can leverage state-of-the-art storage techniques to significantly reduce their runtime and enhance the productivity of their research. This project will deliver substantial gains to almost every research area that uses HPC resources, leading to improved science and engineering methods and products in all fields. This research will have an immediate and significant impact on existing scientific applications and on deriving guidelines for next-generation HPC system design, deployment, and utilization. The project will also contribute to educational outcomes. In addition to students working directly on project goals, results developed in the project will be used in tutorial and training sessions at Texas Advanced Computing Center’s summer institute in deep learning and other major conferences, and in University of Illinois Urbana-Champaign student projects. The project is aligned with the National Strategic Computing Initiative (NSCI) to advance US leadership in HPC.
This project, bbThemis (https://github.com/bbThemis), leverages a suite of technologies, such as disassociation of I/O processing from control logic, time-sliced intra I/O node sharing, function interception for low overhead POSIX I/O, and metadata and data placement for optimal individual application performance. It is investigating how to best apply these technologies, by: 1) Identifying optimal burst buffer configurations for a suite of representative supercomputing applications; 2) Proposing, prototyping, and verifying different design options to address intra-node and inter-node I/O performance sharing; and 3) Designing and evaluating a set of sharing policies, such as fair sharing and priority sharing, with real applications and I/O traces. This project will dramatically increase the sharing capacity of existing burst buffers and enhance domain scientists’ productivity at a large scale. It explores various sharing policies that permit efficient sharing of I/O resources and that meet the requirements of computing centers. The results will enable the provisioning of I/O resources, where users can request specific IOPS or bandwidth for a period of time. The prototype burst buffer sharing framework will immediately increase the capacity of existing supercomputers with enhanced I/O performance. The lessons learned will guide next-generation I/O system design for large scale systems. The general improvement of HPC, Big Data, and DL applications will also increase the coherence of the hardware and software used for data analytics computing and modeling and simulation.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
| Status | Finished |
|---|---|
| Effective start/end date | 10/1/20 → 9/30/23 |
Funding
- National Science Foundation: $292,863.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.