Architectural requirements and scalability of the NAS parallel benchmarks

Frederick C. Wong, Richard P. Martin, Remzi H. Arpaci-Dusseau, David E. Culler

Research output: Chapter in Book/Report/Conference proceedingConference contribution

68 Scopus citations


We present a study of the architectural requirements and scalability of the NAS Parallel Benchmarks. Through direct measurements and simulations, we identify the factors which affect the scalability of benchmark codes on two relevant and distinct platforms; a cluster of workstations and a ccNUMA SGI Origin 2000. We find that the benefit of increased global cache size is pronounced in certain applications and often offsets the communication cost. By constructing the working set profile of the benchmarks, we are able to visualize the improvement of computational efficiency under constant-problem-size scaling. We also find that, while the Origin MPI has better point-to-point performance, the cluster MPI layer is more scalable with communication load. However, communication performance within the applications is often much lower than what would be achieved by micro-benchmarks. We show that the communication protocols used by MPI runtime library are influential to the communication performance in applications, and that the benchmark codes have a wide spectrum of communication requirements.

Original languageEnglish (US)
Title of host publicationACM/IEEE SC 1999 Conference, SC 1999
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages1
ISBN (Electronic)1581130910, 9781581130911
StatePublished - 1999
Externally publishedYes
Event1999 ACM/IEEE Conference on Supercomputing, SC 1999 - Portland, United States
Duration: Nov 13 1999Nov 19 1999

Publication series

NameACM/IEEE SC 1999 Conference, SC 1999


Other1999 ACM/IEEE Conference on Supercomputing, SC 1999
Country/TerritoryUnited States

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture


Dive into the research topics of 'Architectural requirements and scalability of the NAS parallel benchmarks'. Together they form a unique fingerprint.

Cite this