Ld: Low-overhead GPU race detection without access monitoring

Pengcheng Li, Xiaoyu Hu, Dong Chen, Jacob Brock, Hao Luo, Eddy Z. Zhang, Chen Ding

Research output: Contribution to journalArticlepeer-review

4 Scopus citations


Data race detection has become an important problem in GPU programming. Previous designs of CPU racechecking tools are mainly task parallel and incur high overhead on GPUs due to access instrumentation, especially when monitoring many thousands of threads routinely used by GPU programs. This article presents a novel data-parallel solution designed and optimized for the GPU architecture. It includes compiler support and a set of runtime techniques. It uses value-based checking, which detects the races reported in previous work, finds new races, and supports race-free deterministic GPU execution. More important, race checking is massively data parallel and does not introduce divergent branching or atomic synchronization. Its slowdown is less than 5× for over half of the tests and 10 × on average, which is orders of magnitude more efficient than the cuda-memcheck tool by Nvidia and the methods that use fine-grained access instrumentation.

Original languageEnglish (US)
Article number9
JournalACM Transactions on Architecture and Code Optimization
Issue number1
StatePublished - Mar 2017

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture


  • GPU race detection
  • Instrumentation-free
  • Low overhead
  • Value-based checking

Fingerprint Dive into the research topics of 'Ld: Low-overhead GPU race detection without access monitoring'. Together they form a unique fingerprint.

Cite this