TY - JOUR
T1 - Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil
AU - You, Yang
AU - Fu, Haohuan
AU - Song, Shuaiwen Leon
AU - Dehnavi, Maryam Mehri
AU - Gan, Lin
AU - Huang, Xiaomeng
AU - Yang, Guangwen
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China (grant numbers 61303003 and 41374113) and the National High-tech R&D (863) Program of China (grant number 2013AA01A208).
Publisher Copyright:
© The Author(s) 2014.
PY - 2014/8/1
Y1 - 2014/8/1
N2 - Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, which greatly limits their performance and power efficiency. In this paper, we accelerate the forward-modeling technique on the latest multi-core and many-core architectures such as Intel® Sandy Bridge CPUs, NVIDIA Fermi C2070 GPUs, NVIDIA Kepler K20× GPUs, and the Intel® Xeon Phi co-processor. For the GPU platforms, we propose two parallel strategies to explore the performance optimization opportunities for our stencil kernels. For Sandy Bridge CPUs and MIC, we also employ various optimization techniques in order to achieve the best performance. Although our stencil with 114 component variables poses several great challenges for performance optimization, and the low stencil ratio between computation and memory access is too inefficient to fully take advantage of our evaluated architectures, we manage to achieve performance efficiencies ranging from 4.730% to 20.02% of the theoretical peak. We also conduct cross-platform performance and power analysis (focusing on Kepler GPU and MIC) and the results could serve as insights for users selecting the most suitable accelerators for their targeted applications.
AB - Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time-consuming, which greatly limits their performance and power efficiency. In this paper, we accelerate the forward-modeling technique on the latest multi-core and many-core architectures such as Intel® Sandy Bridge CPUs, NVIDIA Fermi C2070 GPUs, NVIDIA Kepler K20× GPUs, and the Intel® Xeon Phi co-processor. For the GPU platforms, we propose two parallel strategies to explore the performance optimization opportunities for our stencil kernels. For Sandy Bridge CPUs and MIC, we also employ various optimization techniques in order to achieve the best performance. Although our stencil with 114 component variables poses several great challenges for performance optimization, and the low stencil ratio between computation and memory access is too inefficient to fully take advantage of our evaluated architectures, we manage to achieve performance efficiencies ranging from 4.730% to 20.02% of the theoretical peak. We also conduct cross-platform performance and power analysis (focusing on Kepler GPU and MIC) and the results could serve as insights for users selecting the most suitable accelerators for their targeted applications.
KW - 3D wave forward modeling
KW - Complex stencil
KW - Intel Xeon Phi
KW - Kepler GPU
KW - optimization techniques
KW - performance power analysis
UR - http://www.scopus.com/inward/record.url?scp=84907207482&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84907207482&partnerID=8YFLogxK
U2 - 10.1177/1094342014524807
DO - 10.1177/1094342014524807
M3 - Article
AN - SCOPUS:84907207482
SN - 1094-3420
VL - 28
SP - 301
EP - 318
JO - International Journal of High Performance Computing Applications
JF - International Journal of High Performance Computing Applications
IS - 3
ER -