Locality-aware software throttling for sparse matrix operation on GPUs

Yanhao Chen, Ari B. Hayes, Chi Zhang, Eddy Z. Zhang, Timothy Salmon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

This paper tackles the cache thrashing problem caused by the non-deterministic scheduling feature of bulk synchronous parallel (BSP) execution in GPUs. In the BSP model, threads can be executed and interleaved in any order before reaching a barrier synchronization point, which requires the entire working set to be in cache for maximum data reuse over time. However, it is not always possible to fit all the data in cache at once. Thus, we propose a locality-aware software throttling framework that throttles the number of active execution tasks, prevents cache thrashing, and enhances data reuse over time. Our locality-aware software throttling framework focuses on an important class of applications that operate on sparse matrices (graphs). These applications come from the domains of linear algebra, graph processing, machine learning and scientific simulation. Evaluated on over 200 real sparse matrices and graphs that suffer from cache thrashing in the Florida sparse matrix collection, our technique achieves an average of 2.01X speedup, a maximum of 6.45X speedup, and a maximum performance loss ≤5%.

Original languageEnglish (US)
Title of host publicationProceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018
PublisherUSENIX Association
Pages413-425
Number of pages13
ISBN (Electronic)9781939133021
StatePublished - Jan 1 2020
Event2018 USENIX Annual Technical Conference, USENIX ATC 2018 - Boston, United States
Duration: Jul 11 2018Jul 13 2018

Publication series

NameProceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018

Conference

Conference2018 USENIX Annual Technical Conference, USENIX ATC 2018
Country/TerritoryUnited States
CityBoston
Period7/11/187/13/18

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Locality-aware software throttling for sparse matrix operation on GPUs'. Together they form a unique fingerprint.

Cite this