Cascaded execution: Speeding up unparallelized execution on shared-memory multiprocessors

Ruth E. Anderson, Thu D. Nguyen, John Zahorjan

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations


Both inherently sequential code and limitations of analysis techniques prevent full parallelization of many applications by parallelizing compilers. Amdahl's Law tells us that as parallelization becomes increasingly effective, any unparallelized loop becomes an increasingly dominant performance bottleneck. We present a technique for speeding up the execution of unparallelized loops by cascading their sequential execution across multiple processors: only a single processor executes the loop body at any one time, and each processor executes only a portion of the loop body before passing control to another. Cascaded execution allows otherwise idle processors to optimize their memory state for the eventual execution of their next portion of the loop, resulting in significantly reduced overall loop body execution times. We evaluate cascaded execution using loop nests from wave5, a Spec95fp benchmark application, and a synthetic benchmark. Running on a PC with 4 Pentium Pro processors and an SGI Power Onyx with 8 R10000 processors, we observe an overall speedup of 1.35 and 1.7, respectively, for the wave5 loops we examined, and speedups as high as 4.5 for individual loops. Our extrapolated results using the synthetic benchmark show a potential for speedups as large as 16 on future machines.

Original languageEnglish (US)
Pages (from-to)714-719
Number of pages6
JournalProceedings of the International Parallel Processing Symposium, IPPS
StatePublished - 1999
Externally publishedYes
EventProceedings of the 1999 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing - San Juan
Duration: Apr 12 1999Apr 16 1999

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Fingerprint Dive into the research topics of 'Cascaded execution: Speeding up unparallelized execution on shared-memory multiprocessors'. Together they form a unique fingerprint.

Cite this