CCST Logo
resources | software | research | pubs | information MCS Logo

PRISM Project: Parallel Symmetric Eigensolver


Prism Logo


Christian Bischof, William L. George, Steve Huss-Lederman , Anna Tsao , Thomas Turnbull, Robert van de Geijn, Yuan-Jye Jason Wu

Introduction

Finding eigenvalues and eigenvectors is an important technique in many areas. For example, a classic problem in physics is to find the frequencies of a spring (oscillator) from the eigenvalues of the mathematical description. In cutting-edge research on atomic structures, the determination of eigenvalues and eigenvectors often plays a central role (see example at Pacific Northwest National Laboratory). As a result, mathematical software to solve this problem has become one of the important computational kernels in linear algebra. One of the early libraries for linear algebra was EISPACK, which solved a variety of types of eigenproblems.

A key class of problems is the solution of the symmetric eigenproblem. This problem arises in many applications, since the eigenvalues are always real numbers and physically observable values are real. (This is in contrast to the nonsymmetric eigenproblem where the eigenvalues are typically complex numbers.) Since before the advent of EISPACK, the method of choice for solving the symmetric eigenproblem has been the QR algorithm. It has the very desirable property that it is provably stable, that is, it always generates the correct answer. Furthermore, Fortran libraries exist that are efficient on everything from workstations to vector supercomputers. The continued importance of this algorithm is shown by its inclusion in the recent LAPACK project, which updated libraries for linear algebra.

Recently, a number of the important applications that utilize the symmetric eigenproblem have turned to parallel computers to allow for the solution of larger and more realistic problems. The difficulty encountered is that the QR algorithm has proven difficult to run efficiently on parallel computers. This arises from necessary steps in the algorithm that have proven to be sequential bottlenecks and therefore resisted effective use of parallelism. As a result, the QR algorithm is currently efficient only up to a few processors. This feature makes it unsuitable for the massively parallel computers of interest in solving many computational problems.

Several research groups are attempting to develop libraries for solving the symmetric eigenproblem that would be efficient on massively parallel computers. Our effort, the PRISM (Parallel Research on Invariant Subspace Methods) project is investigating a new mathematical algorithm based on the work of Auslander and Tsao. Our approach has the same desirable property as the QR algorithm in that it accurately finds the solution in all known cases. Furthermore, unlike the QR algorithm, it is based on computational kernels that allow for efficient implementations on massively parallel computers. As a result, the PRISM symmetric eigensolver has shown to be a viable option as a key library for the solution of large, cutting-edge scientific problems.

Method

One method to readily achieve an efficient library that runs on a wide range of machines is to base the algorithm on key kernels that can easily run well across many parallel computers. This technique is commonly used in linear algebra. The current BLAS (Basic Linear Algebra Subroutines) provide a set of routines that run well on a wide range of sequential computers. The popular LAPACK library is based on these routines. As a result, LAPACK achieves good performance across such diverse machines as RISC workstations and vector supercomputers. The PRISM project is taking a similar approach by using a method that has matrix multiplication as the key computational kernel. Matrix multiplication is a key BLAS routine which was used in the sequential LAPACK library. Our group and others have shown that this kernel can also be made to run efficiently across a wide range of parallel computers.  Since over 90% of our work for large problems is done with matrix multiplication, the PRISM eigensolver runs very efficiently in parallel when layered on top of these efficient matrix multiplication routines.

The PRISM algorithm is able to utilize matrix multiplication by using the mathematical fact that applying a polynomial to matrix changes its eigenvalues but not its invariant subspaces, i.e., eigenvectors. By appropriate choice of a polynomial, the eigenvalues can be modified to yield a matrix which is significantly easier to solve than the original problem. Specifically, we use the first incomplete Beta function (3x^2 / 2x^3) which can be used to divide the eigenvalues into two distinct groups. Most of the work in applying a polynomial to a matrix results from taking powers of the matrix which requires matrix multiplication. This yields a parallel algorithm that is dominated by the efficient matrix multiplication kernel.

Once the two groups of eigenvalues are formed, the problem can be divided into two parts based on these groups. This is accomplished by finding an orthogonal transformation to represent these two groups. This is done in parallel by a new rank-revealing tridiagonalization technique developed by the PRISM project. Once the problem is divided into two parts, each part can then be solved independently. Further dividing the groups into more subgroups yields many independent parts to solve. This leads to the second reason why the PRISM eigensolver is well suited to parallel computing: each one of the independent parts can be solved separately so that the parallel hardware can be effectively utilized.

Status

Since the beginning of the PRISM project, a primary goal has been to supply software to application users. To meet the needs of a diverse set of users necessitates that the PRISM library run on the full range of parallel computers. To achieve this goal, the initial PRISM software utilized the Chameleon communication package by Bill Gropp and Barry Smith from Argonne National Laboratory. This package allows programs to be written in a machine-independent manner while still providing very fast performance on all the parallel machines.

A recent development in the parallel computing community is the availability of MPI (Message Passing Interface). This standard is now supported by all the major suppliers of MPPs (Massively Parallel Processors) as well as public domain versions such as the one provided by Argonne National Laboratory and Mississippi State University. As a result, using MPI to achieve portable message-passing code is now the option of choice. In the past year, all the PRISM software has been converted to use MPI. As a result, our eigensolver library now runs on all platforms of interest to the application user. This includes MPPs (IBM SP2, Cray T3D, Intel Paragon, ...) to Network of Workstations (NOWs). The public domain PRISM library comes with verification software to assure the end user that the library is working properly on their particular computer. The user guide and the latest version of PRISM software are available.

Performance

The PRISM symmetric eigensolver has been run on a wide variety of parallel machines. The results in the figure below show the total performance of the PRISM eigensolver in GFLOPS (Billion of Floating Point Operations per Second) on the Argonne SP and Caltech Paragon. The points on the same line represent a constant size problem per node. For example, a local dimension of 400 on 64 processors (8x8 grid) solves a matrix of order 3200 whereas on 100 processors solves a matrix of order 4000. The figure shows that as the local dimension grows, the performance per node is substantially larger for the SP than the Paragon. For example, for local dimension of 400 on 100 processors, the SP achieves over 6 GFLOPS whereas the Paragon achieves only slighter greater than half this performance. This is consistent with the fact that the SP uses RS6000 compute nodes whose peak rating is faster than that of the Paragon's i860 processor. The PRISM eigensolver, which is based on matrix multiplication, is able to effectively utilize these processors to achieve performance that is a substantial fraction of the total peak speed of the machine.

Plot of results

Though GFLOPS is one important measure of performance, it is not necessarily the most important one for parallel eigensolvers. This is due to the fact that different techniques need to perform different amounts of work. As a result, an eigensolver library can achieve a higher GFLOPS rating while still taking longer to solver a problem because it does more work. The important concern of most application users is the total time to solution. The figure below shows the total time to solution as a function of the number of processors. Results are shown for several matrix sizes on the Argonne SP. As can be seen, it is important to have a large enough problem to be able to effectively utilize a large number of processors. But for applications with large matrices, the SP can significantly reduce the time to solution. The figure shows that a matrix of order

Plot of results

4800 can be solved in about 36 minutes. More complete results are also available in the user guide.

Acknowledgments

This work was partially supported by ARPA grant DM28E04120 and P-95006, and DOE grant W-31-109-Eng-38. Access to the Intel Paragon(s) system operated by Caltech on behalf of the Concurrent Supercomputing Consortium was provided by NSF.

[ Account Request | Quad | Denali | Yukon | Tundra | ADSM | Announcements | CCST | MCS ]
Last updated on January 31, 2000
webmaster@mcs.anl.gov