TY - GEN
T1 - On the parallelization of blocked LU factorization algorithms on distributed memory architectures
AU - Von Laszewski, Gregor
AU - Parashar, Manish
AU - Mohamed, A. Gaber
AU - Fox, Geoffrey C.
N1 - Funding Information:
1This research is sponsored by D.4FwA mder contract #DABT63-91-k-OO05. The content of the information does not necessary reflect the position or the policy of the Government and no official endorsement should be inferred. Use of the Intel iPSC/860 was provided by the Center for Research on Parallel Computation under NSF Cooperative Agreement Nos. CCR-8809615 and CDA-8619893 with support from the Keck fonm dation.
PY - 1992/12/1
Y1 - 1992/12/1
N2 - Solutions to systems of linear equations and specifically, the LU factorization of matrices form the computational core of many scientific and engineering applications. In this paper, we present the parallelization of blocked algorithms for L U factorization. We isolate problems inherent to sequential blocked algorithms and provide approaches to overcome them on distributed memory architectures. The performance of the parallelized versions of three blocked algorithms suited to column oriented Fortran is compared. Experiments are performed on the iPSC/860 Hypercube. Our study shows that it is not intuitively clear which algorithm might perform best on a given architecture, but is dependent on the problem size and the number of available processors.
AB - Solutions to systems of linear equations and specifically, the LU factorization of matrices form the computational core of many scientific and engineering applications. In this paper, we present the parallelization of blocked algorithms for L U factorization. We isolate problems inherent to sequential blocked algorithms and provide approaches to overcome them on distributed memory architectures. The performance of the parallelized versions of three blocked algorithms suited to column oriented Fortran is compared. Experiments are performed on the iPSC/860 Hypercube. Our study shows that it is not intuitively clear which algorithm might perform best on a given architecture, but is dependent on the problem size and the number of available processors.
UR - http://www.scopus.com/inward/record.url?scp=0039821550&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0039821550&partnerID=8YFLogxK
U2 - 10.1109/superc.1992.236696
DO - 10.1109/superc.1992.236696
M3 - Conference contribution
AN - SCOPUS:0039821550
T3 - Proceedings of the International Conference on Supercomputing
SP - 170
EP - 179
BT - Proceedings of the 1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992
A2 - Werner, Robert
PB - Association for Computing Machinery
T2 - 1992 ACM/IEEE conference on Supercomputing, Supercomputing 1992
Y2 - 16 November 1992 through 20 November 1992
ER -