Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU

Bo Wu, Zhijia Zhao, Eddy Zheng Zhang, Yunlian Jiang, Xipeng Shen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

41 Scopus citations

Abstract

The performance of Graphic Processing Units (GPU) is sensitive to irregular memory references. Some recent work shows the promise of data reorganization for eliminating non-coalesced memory accesses that are caused by irregular references. However, all previous studies have employed simple, heuristic methods to determine the new data layouts to create. As a result, they either do not provide any performance guarantee or are effective to only some limited scenarios. This paper contributes a fundamental study to the problem. It systematically analyzes the inherent complexity of the problem in various settings, and for the first time, proves that the problem is NP-complete. It then points out the limitations of existing techniques and reveals that in practice, the essence for designing an appropriate data reorganization algorithm can be reduced to a tradeoff among space, time, and complexity. Based on that insight, it develops two new data reorganization algorithms to overcome the limitations of previous methods. Experiments show that an assembly composed of the new algorithms and a previous algorithm can circumvent the inherent complexity in finding optimal data layouts, making it feasible to minimize non-coalesced memory accesses for a variety of irregular applications and settings that are beyond the reach of existing techniques.

Original languageEnglish (US)
Title of host publicationPPoPP 2013 - Proceedings of the 2013 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Pages57-67
Number of pages11
DOIs
StatePublished - 2013
Event18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2013 - Shenzhen, China
Duration: Feb 23 2013Feb 27 2013

Publication series

NameProceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Other

Other18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2013
Country/TerritoryChina
CityShenzhen
Period2/23/132/27/13

All Science Journal Classification (ASJC) codes

  • Software

Keywords

  • computational complexity
  • data transformation
  • gpgpu
  • memory coalescing
  • runtime optimizations
  • thread-data remapping

Fingerprint

Dive into the research topics of 'Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU'. Together they form a unique fingerprint.

Cite this