A framework for flexible and scalable replica-exchange on production distributed CI

Brian K. Radak, Melissa Romanus, Emilio Gallicchio, Tai Sung Lee, Ole Weidner, Nan Jie Deng, Peng He, Wei Dai, Darrin M. York, Ronald M. Levy, Shantenu Jha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Replica exchange represents a powerful class of algorithms used for enhanced configurational and energetic sampling in a range of physical systems. Computationally it represents a type of application with multiple scales of communication. At a fine-grained level there is often communication with a replica, typically an MPI process. At a coarse-grained level, the replicas communicate with other replicas - both temporally as well as in amount of data exchanged. This paper outlines a novel framework developed to support the flexible execution of large-scale replica exchange. The framework is flexible in the sense that it supports different coupling schemes between replicas and is agnostic to the specific underlying simulation - classical or quantum, serial or parallel simulation. The scalability of the framework is assessed using standard simulation benchmarks. In spite of the increasing communication and coordination requirements as a function of the number of replicas, our framework supports the execution of hundreds replicas without significant overhead. Although there are several specific aspects that will benefit from further optimization, a first working prototype has the ability to fundamentally change the scale of replica exchange simulations possible on production distributed cyberinfrastructure such as XSEDE, as well as support novel usage modes. This paper also represents the release of the framework to the broader biophysical simulation community and provides details on its usage.

Original languageEnglish (US)
Title of host publicationProceedings of the XSEDE 2013 Conference
Subtitle of host publicationGateway to Discovery
DOIs
StatePublished - 2013
EventConference on Extreme Science and Engineering Discovery Environment, XSEDE 2013 - San Diego, CA, United States
Duration: Jul 22 2013Jul 25 2013

Publication series

NameACM International Conference Proceeding Series

Other

OtherConference on Extreme Science and Engineering Discovery Environment, XSEDE 2013
Country/TerritoryUnited States
CitySan Diego, CA
Period7/22/137/25/13

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Keywords

  • AMBER
  • Distributed computing
  • HPC
  • IMPACT
  • Large scale
  • MD
  • XSEDE resources

Fingerprint

Dive into the research topics of 'A framework for flexible and scalable replica-exchange on production distributed CI'. Together they form a unique fingerprint.

Cite this