Engineering a high-performance GPU B-tree

Muhammad A. Awad, Saman Ashkiani, Rob Johnson, Martin Farach-Colton, John D. Owens

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We engineer a GPU implementation of a B-Tree that supports concurrent queries (point, range, and successor) and updates (insertions and deletions). Our B-tree outperforms the state of the art, a GPU log-structured merge tree (LSM) and a GPU sorted array. In particular, point and range queries are significantly faster than in a GPU LSM (the GPU LSM does not implement successor queries). Furthermore, B-Tree insertions are also faster than LSM and sorted array insertions unless insertions come in batches of more than roughly 100k. Because we cache the upper levels of the tree, we achieve lookup throughput that exceeds the DRAM bandwidth of the GPU. We demonstrate that the key limiter of performance on a GPU is contention and describe the design choices that allow us to achieve this high performance.

Original languageEnglish (US)
Title of host publicationPPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming
PublisherAssociation for Computing Machinery
Pages145-157
Number of pages13
ISBN (Electronic)9781450362252
DOIs
StatePublished - Feb 16 2019
Event24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019 - Washington, United States
Duration: Feb 16 2019Feb 20 2019

Publication series

NameProceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Conference

Conference24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019
CountryUnited States
CityWashington
Period2/16/192/20/19

Fingerprint

Dynamic random access storage
Limiters
Graphics processing unit
Throughput
Bandwidth
Engineers

All Science Journal Classification (ASJC) codes

  • Software

Keywords

  • B-tree
  • Data structures
  • Dynamic
  • GPU
  • Mutable

Cite this

Awad, M. A., Ashkiani, S., Johnson, R., Farach-Colton, M., & Owens, J. D. (2019). Engineering a high-performance GPU B-tree. In PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming (pp. 145-157). (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP). Association for Computing Machinery. https://doi.org/10.1145/3293883.3295706
Awad, Muhammad A. ; Ashkiani, Saman ; Johnson, Rob ; Farach-Colton, Martin ; Owens, John D. / Engineering a high-performance GPU B-tree. PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming. Association for Computing Machinery, 2019. pp. 145-157 (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP).
@inproceedings{4807c1618ccc47dd9ed2e581b827f1ec,
title = "Engineering a high-performance GPU B-tree",
abstract = "We engineer a GPU implementation of a B-Tree that supports concurrent queries (point, range, and successor) and updates (insertions and deletions). Our B-tree outperforms the state of the art, a GPU log-structured merge tree (LSM) and a GPU sorted array. In particular, point and range queries are significantly faster than in a GPU LSM (the GPU LSM does not implement successor queries). Furthermore, B-Tree insertions are also faster than LSM and sorted array insertions unless insertions come in batches of more than roughly 100k. Because we cache the upper levels of the tree, we achieve lookup throughput that exceeds the DRAM bandwidth of the GPU. We demonstrate that the key limiter of performance on a GPU is contention and describe the design choices that allow us to achieve this high performance.",
keywords = "B-tree, Data structures, Dynamic, GPU, Mutable",
author = "Awad, {Muhammad A.} and Saman Ashkiani and Rob Johnson and Martin Farach-Colton and Owens, {John D.}",
year = "2019",
month = "2",
day = "16",
doi = "10.1145/3293883.3295706",
language = "English (US)",
series = "Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP",
publisher = "Association for Computing Machinery",
pages = "145--157",
booktitle = "PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming",

}

Awad, MA, Ashkiani, S, Johnson, R, Farach-Colton, M & Owens, JD 2019, Engineering a high-performance GPU B-tree. in PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, Association for Computing Machinery, pp. 145-157, 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019, Washington, United States, 2/16/19. https://doi.org/10.1145/3293883.3295706

Engineering a high-performance GPU B-tree. / Awad, Muhammad A.; Ashkiani, Saman; Johnson, Rob; Farach-Colton, Martin; Owens, John D.

PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming. Association for Computing Machinery, 2019. p. 145-157 (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Engineering a high-performance GPU B-tree

AU - Awad, Muhammad A.

AU - Ashkiani, Saman

AU - Johnson, Rob

AU - Farach-Colton, Martin

AU - Owens, John D.

PY - 2019/2/16

Y1 - 2019/2/16

N2 - We engineer a GPU implementation of a B-Tree that supports concurrent queries (point, range, and successor) and updates (insertions and deletions). Our B-tree outperforms the state of the art, a GPU log-structured merge tree (LSM) and a GPU sorted array. In particular, point and range queries are significantly faster than in a GPU LSM (the GPU LSM does not implement successor queries). Furthermore, B-Tree insertions are also faster than LSM and sorted array insertions unless insertions come in batches of more than roughly 100k. Because we cache the upper levels of the tree, we achieve lookup throughput that exceeds the DRAM bandwidth of the GPU. We demonstrate that the key limiter of performance on a GPU is contention and describe the design choices that allow us to achieve this high performance.

AB - We engineer a GPU implementation of a B-Tree that supports concurrent queries (point, range, and successor) and updates (insertions and deletions). Our B-tree outperforms the state of the art, a GPU log-structured merge tree (LSM) and a GPU sorted array. In particular, point and range queries are significantly faster than in a GPU LSM (the GPU LSM does not implement successor queries). Furthermore, B-Tree insertions are also faster than LSM and sorted array insertions unless insertions come in batches of more than roughly 100k. Because we cache the upper levels of the tree, we achieve lookup throughput that exceeds the DRAM bandwidth of the GPU. We demonstrate that the key limiter of performance on a GPU is contention and describe the design choices that allow us to achieve this high performance.

KW - B-tree

KW - Data structures

KW - Dynamic

KW - GPU

KW - Mutable

UR - http://www.scopus.com/inward/record.url?scp=85064230046&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064230046&partnerID=8YFLogxK

U2 - 10.1145/3293883.3295706

DO - 10.1145/3293883.3295706

M3 - Conference contribution

AN - SCOPUS:85064230046

T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

SP - 145

EP - 157

BT - PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming

PB - Association for Computing Machinery

ER -

Awad MA, Ashkiani S, Johnson R, Farach-Colton M, Owens JD. Engineering a high-performance GPU B-tree. In PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming. Association for Computing Machinery. 2019. p. 145-157. (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP). https://doi.org/10.1145/3293883.3295706