Architectural support for address translation on GPUs: Designing Memory Management Units for CPU/GPUs with unified address spaces

Bharath Pichai, Lisa Hsu, Abhishek Bhattacharjee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

61 Citations (Scopus)

Abstract

The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent example, necessitates a manageable programming model to ensure widespread adoption. A key component of this is a shared unified address space between the heterogeneous units to obtain the programmability benefits of virtual memory. To this end, we explore GPU Memory Management Units (MMUs) consisting of Translation Lookaside Buffers (TLBs) and page table walkers (PTWs) in unified heterogeneous systems.We show the challenges posed by GPU warp schedulers on TLBs accessed in parallel with L1 caches, which provide many well-known programmability benefits. In response, we propose modest TLB and PTW augmentations that recover most of the performance lost by introducing L1-parallel TLB access. We also show that a little TLB-awareness can make other GPU performance enhancements (e.g., cache-conscious warp scheduling and dynamic warp formation on branch divergence) feasible in the face of cache-parallel address translation, bringing overheads in the range deemed acceptable for CPUs (10-15% of runtime). We presume this initial design leaves room for improvement but anticipate the bigger insight, that a little TLB-awareness goes a long way in GPUs, will spur further work in this area.

Original languageEnglish (US)
Title of host publicationASPLOS 2014 - 19th International Conference on Architectural Support for Programming Languages and Operating Systems
Pages743-757
Number of pages15
DOIs
StatePublished - Mar 14 2014
Event19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2014 - Salt Lake City, UT, United States
Duration: Mar 1 2014Mar 5 2014

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

Other

Other19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2014
CountryUnited States
CitySalt Lake City, UT
Period3/1/143/5/14

Fingerprint

Memory management units
Program processors
Scheduling
Graphics processing unit
Data storage equipment

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Hardware and Architecture

Keywords

  • GPUs
  • MMUs
  • TLBs
  • Unified address space

Cite this

Pichai, B., Hsu, L., & Bhattacharjee, A. (2014). Architectural support for address translation on GPUs: Designing Memory Management Units for CPU/GPUs with unified address spaces. In ASPLOS 2014 - 19th International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 743-757). (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS). https://doi.org/10.1145/2541940.2541942
Pichai, Bharath ; Hsu, Lisa ; Bhattacharjee, Abhishek. / Architectural support for address translation on GPUs : Designing Memory Management Units for CPU/GPUs with unified address spaces. ASPLOS 2014 - 19th International Conference on Architectural Support for Programming Languages and Operating Systems. 2014. pp. 743-757 (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS).
@inproceedings{31593d9ee20f46f880e21c37481123d5,
title = "Architectural support for address translation on GPUs: Designing Memory Management Units for CPU/GPUs with unified address spaces",
abstract = "The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent example, necessitates a manageable programming model to ensure widespread adoption. A key component of this is a shared unified address space between the heterogeneous units to obtain the programmability benefits of virtual memory. To this end, we explore GPU Memory Management Units (MMUs) consisting of Translation Lookaside Buffers (TLBs) and page table walkers (PTWs) in unified heterogeneous systems.We show the challenges posed by GPU warp schedulers on TLBs accessed in parallel with L1 caches, which provide many well-known programmability benefits. In response, we propose modest TLB and PTW augmentations that recover most of the performance lost by introducing L1-parallel TLB access. We also show that a little TLB-awareness can make other GPU performance enhancements (e.g., cache-conscious warp scheduling and dynamic warp formation on branch divergence) feasible in the face of cache-parallel address translation, bringing overheads in the range deemed acceptable for CPUs (10-15{\%} of runtime). We presume this initial design leaves room for improvement but anticipate the bigger insight, that a little TLB-awareness goes a long way in GPUs, will spur further work in this area.",
keywords = "GPUs, MMUs, TLBs, Unified address space",
author = "Bharath Pichai and Lisa Hsu and Abhishek Bhattacharjee",
year = "2014",
month = "3",
day = "14",
doi = "10.1145/2541940.2541942",
language = "English (US)",
isbn = "9781450323055",
series = "International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS",
pages = "743--757",
booktitle = "ASPLOS 2014 - 19th International Conference on Architectural Support for Programming Languages and Operating Systems",

}

Pichai, B, Hsu, L & Bhattacharjee, A 2014, Architectural support for address translation on GPUs: Designing Memory Management Units for CPU/GPUs with unified address spaces. in ASPLOS 2014 - 19th International Conference on Architectural Support for Programming Languages and Operating Systems. International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS, pp. 743-757, 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2014, Salt Lake City, UT, United States, 3/1/14. https://doi.org/10.1145/2541940.2541942

Architectural support for address translation on GPUs : Designing Memory Management Units for CPU/GPUs with unified address spaces. / Pichai, Bharath; Hsu, Lisa; Bhattacharjee, Abhishek.

ASPLOS 2014 - 19th International Conference on Architectural Support for Programming Languages and Operating Systems. 2014. p. 743-757 (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Architectural support for address translation on GPUs

T2 - Designing Memory Management Units for CPU/GPUs with unified address spaces

AU - Pichai, Bharath

AU - Hsu, Lisa

AU - Bhattacharjee, Abhishek

PY - 2014/3/14

Y1 - 2014/3/14

N2 - The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent example, necessitates a manageable programming model to ensure widespread adoption. A key component of this is a shared unified address space between the heterogeneous units to obtain the programmability benefits of virtual memory. To this end, we explore GPU Memory Management Units (MMUs) consisting of Translation Lookaside Buffers (TLBs) and page table walkers (PTWs) in unified heterogeneous systems.We show the challenges posed by GPU warp schedulers on TLBs accessed in parallel with L1 caches, which provide many well-known programmability benefits. In response, we propose modest TLB and PTW augmentations that recover most of the performance lost by introducing L1-parallel TLB access. We also show that a little TLB-awareness can make other GPU performance enhancements (e.g., cache-conscious warp scheduling and dynamic warp formation on branch divergence) feasible in the face of cache-parallel address translation, bringing overheads in the range deemed acceptable for CPUs (10-15% of runtime). We presume this initial design leaves room for improvement but anticipate the bigger insight, that a little TLB-awareness goes a long way in GPUs, will spur further work in this area.

AB - The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent example, necessitates a manageable programming model to ensure widespread adoption. A key component of this is a shared unified address space between the heterogeneous units to obtain the programmability benefits of virtual memory. To this end, we explore GPU Memory Management Units (MMUs) consisting of Translation Lookaside Buffers (TLBs) and page table walkers (PTWs) in unified heterogeneous systems.We show the challenges posed by GPU warp schedulers on TLBs accessed in parallel with L1 caches, which provide many well-known programmability benefits. In response, we propose modest TLB and PTW augmentations that recover most of the performance lost by introducing L1-parallel TLB access. We also show that a little TLB-awareness can make other GPU performance enhancements (e.g., cache-conscious warp scheduling and dynamic warp formation on branch divergence) feasible in the face of cache-parallel address translation, bringing overheads in the range deemed acceptable for CPUs (10-15% of runtime). We presume this initial design leaves room for improvement but anticipate the bigger insight, that a little TLB-awareness goes a long way in GPUs, will spur further work in this area.

KW - GPUs

KW - MMUs

KW - TLBs

KW - Unified address space

UR - http://www.scopus.com/inward/record.url?scp=84897759661&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84897759661&partnerID=8YFLogxK

U2 - 10.1145/2541940.2541942

DO - 10.1145/2541940.2541942

M3 - Conference contribution

AN - SCOPUS:84897759661

SN - 9781450323055

T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

SP - 743

EP - 757

BT - ASPLOS 2014 - 19th International Conference on Architectural Support for Programming Languages and Operating Systems

ER -

Pichai B, Hsu L, Bhattacharjee A. Architectural support for address translation on GPUs: Designing Memory Management Units for CPU/GPUs with unified address spaces. In ASPLOS 2014 - 19th International Conference on Architectural Support for Programming Languages and Operating Systems. 2014. p. 743-757. (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS). https://doi.org/10.1145/2541940.2541942