Optimizing checkpoints using NVM as virtual memory

Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan, Dejan Milojicic

Research output: Contribution to conferencePaperpeer-review

74 Scopus citations


Rapid check pointing will remain key functionality for next generation high end machines. This paper explores the use of node-local nonvolatile memories (NVM) such as phase-change memory, to provide frequent, low overhead checkpoints. By adapting existing multi-level checkpoint techniques, we devise new methods, termed NVM-checkpoints, that efficiently store checkpoints on both local and remote node NVM. The checkpoint frequencies are guided by failure models that capture the expected accessibility of such data after failure. To lower overheads, NVM-checkpoints reduce the NVM and interconnect bandwidth used with a novel pre-copy mechanism, which incrementally moves checkpoint data from DRAM to NVM before a local checkpoint is started. This reduces local checkpoint cost by limiting the instantaneous data volume moved at checkpoint time, thereby freeing bandwidth for use by applications. In fact, the pre-copy method can reduce peak interconnect usage up to 46%. Since our approach treats NVM as memory rather than as 'Ram disk', pre-copying can be generalized to directly move data to remote NVMs. This results in 40% faster application execution times compared to asynchronous approaches not using pre-copying.

Original languageEnglish (US)
Number of pages12
StatePublished - 2013
Externally publishedYes
Event27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013 - Boston, MA, United States
Duration: May 20 2013May 24 2013


Other27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013
Country/TerritoryUnited States
CityBoston, MA

All Science Journal Classification (ASJC) codes

  • Software


  • Checkpointing
  • Memory bandwidth
  • Non volatile memory (NVM)
  • PCM
  • Pre-Copy


Dive into the research topics of 'Optimizing checkpoints using NVM as virtual memory'. Together they form a unique fingerprint.

Cite this