A routine for diagonalizing real symmetric matrices in parallel was developed from a project code written for the 1994 Institute on Parallel Computing. The algorithm employed is essentially a hybrid of Jacobi methods. Problem sizes ranging from N = 2 to 8192 have been solved.
The parallel Jacobi method described by Brent and Luk assumes the existence of a two-dimensional grid of (N/2)^2 processors to resolve an NxN matrix; i.e. the input system is mapped onto a collection of 2x2 elements. A variation of this method featuring virtualized elements was implemented first.
Performance and scaling of the virtualized Brent-Luk method did not work well for larger problems on the T3D. Essentially, the grain size associated with the algorithm was found to be too small. Time spent fetching data points could not be offset by cache reuse.
This led us to work on a generalization of the Brent-Luk work in which the grain size was made a function of N and the processor count. As the processor count ranges from 1 to N/2, our macro-element Jacobi method transitions from the serial Jacobi method to the Brent-Luk parallel model. Good performance and scaling are observed for a fairly wide range of element dimensions.
The parallel Jacobi (PJAC) library and related documentation have recently been made available on the C90:
/usr/local/lib/libpjac.a man rspjacComplete documentation and examples are available on our World Wide Web server (http://www.psc.edu/general/software/packages/pjac/pjac.html). A paper describing this work was presented at the Fall 1994 Cray User Group Conference in Tours, France. Copies may be obtained by writing to the authors (oneal@psc.edu and rreddy@psc.edu).