
SP Parallel Programming Workshop
Parallel Math Libraries
© Copyright Statement
- Although this tutorial is largely independent of the other MHPCC
tutorials, it will be helpful to have a familiarity with the
concepts of parallel programming and message passing as found in
the following tutorials:
Overview
- Libraries may facilitate entry into parallel programming
several libraries parallelize existing serial routines
- example interfaces to LAPACK LU factor and solve routines:
dgetrf(N, N, A, lda, pivots, ierror)
dgetrs(trans, N, nrhs, A, lda, pivots, b, ldb, ierror)
- interfaces to corresponding ScaLAPACK routines:
pdgetrf(N, N, A, ia, ja, desc_A, pivots, ierror)
pdgetrs(trans,N,nrhs,A,ia,ja,desc_A,pivots,b,ib,jb,desc_b,ierror)
Note: converting a serial program that calls LAPACK to a parallel
program using ScaLAPACK requires additional work to manage the
processors and data
- Several parallel math libraries are now available, these will expand
in functionality over time and others will become available
- ScaLAPACK (Scalable LAPACK)
- PESSL (IBM Parallel ESSL)
- NAG PVM Library
- PIM (Parallel Iterative Methods)
- PETSc (Portable, Extensible Toolkit for Scientific Computation)
- Some currently available functions are:
- linear and nonlinear matrix operations including parallelized BLAS
- direct and iterative linear solvers
- non-linear solvers
- eigensystem and singular value analyses
- FFT (1- to 3-dimensional)
- random number generators
- This tutorial will present an overview of the various libraries currently
available. The details of using them will not be presented, some examples
and pointers to more information will be provided.
Linear Algebra for Dense Systems:
The BLAS and LAPACK
- BLAS (Basic Linear Algebra Subprograms) - set of Fortran 77 computational
kernels for basic operations commonly used by linear algebra routines,
de facto standards
- 3 Levels of BLAS
- Level 1 - scalar and vector operations
e.g. z = ax + y, x and y are vectors and a is scalar
- Level 2 - matrix-vector operations
e.g. z = aAx + by, x and y are vectors, a and b are scalars and A
is a matrix
- Level 3 - matrix-matrix operations
e.g. D = aAB + bC, a and b are scalars, and A, B and C are matrices
- The Level 1 BLAS were the basis for the LINPACK library of linear
algebra routines, their specification was intended to:
- be the basic building blocks to aid in the design and development
of numerical linear algebra software
- improve readability of code
- promote modular code that is more robust and maintainable
- promote efficiency (by use of best algorithms and by encouraging
vendors to implement on their machines)
- improve portability - de facto standardization, machine dependencies
can be "hidden" from higher level code
- Practical experience on high performance vector, hierarchical memory
and shared memory machines indicated that a coarser level of granularity
was needed to obtain the higher performance offered by these machines.
- The Level 2 and 3 BLAS improve efficiency on high performance machines
by increasing granularity (the ratio of floating point operations to
memory references)
BLAS Level |
BLAS routine |
Ops |
Mem Refs |
Ratio |
| 1 |
SAXPY z=ax+y |
2n |
3n |
2:3 |
| 2 |
SGEMV z=aAx+by |
2n**2 |
n**2 |
2:1 |
| 3 |
SGEMM D=aAB+bC |
2n**3 |
4n**2 |
n:2 |
- LAPACK (Linear Algebra Package)
- successor to LINPACK and EISPACK incorporating the functionality
of both
- portability addressed by use of BLAS as the building blocks
- efficiency gained by using level 3 and level 2 BLAS as much as
possible
- LAPACK Routines - designed for dense or banded matrices, not for
general sparse matrices
- Systems of linear equations
- Linear least squares problems
- Eigenvalue and singular value problems, including generalized problems
- Matrix factorizations: LU, Cholesky, QR, QZ, SVD, GSVD, Schur
and generalized Schur
- Condition and error estimates
- Target Machines
- Vector processors
- Shared-memory multi-processors
- Workstations and PCs
- LAPACK routines exploit the level 2 and level 3 BLAS via block
or partitioned algorithms
- Block algorithms partition matrices into blocks and perform the work
on each of the blocks, rather than looking at an element at a time
- Block algorithms increase efficiency by
- control of data movement between levels of memory (cache/vector
registers, main memory, disk memory)
- increase in reuse of data held in lowest level of memory (cache or
vector registers)
- maximization of amount of data held in low-level memory
Linear Algebra for Dense Systems:
The BLACS, the PBLAS and ScaLAPACK
- ScaLAPACK (Scalable LAPACK) - port of LAPACK to distributed memory
environments
- maintain performance (including scalability)
- retain portability
- stay as close as possible to LAPACK in calling sequence, storage, etc.
- promote modularity via set of linear algebra tools (BLAS, BLACS and
PBLAS)
- use LAPACK algorithms when possible
- Performance issues
- data movement
- between vector registers/cache, memory and disk as for LAPACK
- between memory of distributed processors
- controlled via block algorithms that use Level 2 and 3 BLAS
(as in LAPACK)
- load balancing to keep processors active achieved via 2-D block-cyclic
data distribution
BLACS (Basic Linear Algebra Communication Subroutines) -
set of kernel communication routines for message passing
parallel linear algebra routines with the added purposes:
- Aid design and coding
- Promote efficiency by identifying frequently-occurring operations
of linear algebra
- operations are matrix based - readily transmit sub-matrices of a
larger matrix
- include point-to-point communication, broadcasts, reduce operations
and support routines
- context driven - safety mechanism to ensure that messages sent in one
context cannot be received in another
- Ensure portability via standardization of kernels
- Optimized version of the BLACS are available in several formats for
various machines
- The BLACS are not a general purpose communications library (usually
vector based rather than matrix)
PBLAS (Parallel Basic Linear Algebra Subprograms)
- Functionality similar to BLAS - distributed forms of vector-vector,
matrix-vector and matrix-matrix operations
- Simplify parallelization of dense linear algebra code - building
blocks of parallel linear algebra
- Can be implemented on top of the BLAS
- Portability layer for ScaLAPACK - together with BLACS and BLAS -
machine dependencies confined at this level, de facto standards
Sparse Linear Algebra:
The Sparse BLAS
The sparse BLAS are extensions to the BLAS
- goals similar to those of BLAS:
- portability via de facto standardization
- efficiency via tuned implementations for various machines
- improved program clarity, modularity and maintainability
- compressed storage of sparse vectors (non-zero entries)
- sparse vector operations determined by examining existing
sparse libraries for common operations
- BLAS routines that perform non-vector operations (e.g. Givens
rotations)
- BLAS routines that operate correctly on compressed storage vectors
(2-norm, sum of absolute values, constant times vector, etc.)
- BLAS routines requiring sparse extensions
A set of sparse BLAS and a model implementation have recently been defined,
no libraries based on them are yet available.
Sparse Linear Algebra:
The PIM Library
PIM (Parallel Iterative Methods) is a library of FORTRAN 77 routines
to solve systems of linear equations on parallel machines using
iterative techniques.
Main goals:
- allow user freedom to choose matrix storage, access and partitioning
methods
- support portability across variety of parallel architectures and
programming environments
Mechanism to achieve goals: hide details of the following operations
(user supplies the routines)
- matrix-vector product
- preconditioning step
- inner products and vector norms
PIM components:
- Conjugate-Gradients
- Bi-Conjugate-Gradients
- Conjugate-Gradients squared
- stabilized Bi-Conjugate-Gradients
- restarted stabilized Bi-Conjugate-Gradients
- restarted generalized minimal residual
- restarted generalized conjugate residual
- normal equation solvers
- quasi-minimal residual with coupled recurrences
- transpose-free quasi-minimal residual
- Chebyshev acceleration
Other Parallel Libraries:
PESSL
PESSL (Parallel Engineering and Scientific Subroutine Library) is
IBM's parallel analogue of its serial library ESSL.
PESSL components:
- ESSL for core computational routines
- BLAS - tuned for RS6000 processor
- suite of common numerical routines including a subset of LAPACK
- BLACS
communication is MPL based, PESSL routines can be used in MPL
programs, or MPI programs compiled with the MPICH library
- Subset of level 2 and 3 PBLAS
- Subset of ScaLAPACK
- Fourier transforms in 2 and 3 dimensions
- Uniform random number generator
Other Parallel Libraries:
NAG PVM Library
The NAG Numerical PVM Library is the parallel analogue of the NAG Fortran
Library.
NAG library components:
- NAG Fortran libraries for core computational routines
- BLAS - the Fortran library contains untuned BLAS, tuned BLAS
should be substituted when available
- suite of numerical routines including most of LAPACK
- BLACS
communication is PVM based
- level 2 and 3 PBLAS
- Subset of ScaLAPACK
- Quadrature
- Unconstrained minimization
- Sparse linear algebra
- Uniform random number generator
Other Parallel Libraries:
PETSc
PETSc (Portable, Extensible Toolkit for Scientific computation)is
for both uni- and parallel-processor scientific computing:
- especially intended for large-scale problems modeled by partial
differential equations
- contains suite of data structures and routines (linear and non-linear
equations solvers)
- uses MPI for message passing
- provides interfaces to other libraries such as LAPACK and the BLAS
PETSc components:
- Vectors - set of level 1 BLAS-like serial and parallel vector routines
- Matrices - set of routines and data structures for manipulating
parallel sparse matrices
- Krylov Solvers - set of Krylov space iterative solvers with or without
pre-conditioners
- Pre-conditioners - variety of pre-conditioners
- SLES (Simplified Linear Equation Solver) - high level interface to
linear equation solvers
- SNES (Simplified Nonlinear Equation Solver) - routines to solve systems
of nonlinear equations or unconstrained minimization problems
Parallel Libraries at the MHPCC
General guidelines:
- Use LAPACK in favor of LINPACK
- Use tuned BLAS for the architecture you are running on when
available - on RS6K's link with the ESSL library
- The PESSL library can be expected to give the best performance when
it contains the routine you need
Where they are:
- the public domain packages are under /source/pd/math (LAPACK,
ScaLAPACK, PIM, PETSC)
- ESSL, PESSL: the libraries are in /usr/lib
- NAG libraries: /source/vendorcode/nag
- Fortran Library (serial): /source/vendorcode/nag/nagfl16df
- PVM Library: /source/vendorcode/nag/nagfd01df
Documentation:
Example Programs
LU Factor and Solve
-
DESCRIPTION:
-
These programs demonstrate the use of the LU factor and solve routines
to solve a dense system of linear equations. A serial example calling
the routines from LAPACK is given together with a parallel example
that using the ScaLAPACK library routines. The programs are written in
Fortran 90.
FILES:
References, Acknowledgments, WWW Resources
Additional Information on the WWW
References and Acknowledgments
- Notes from the "Workshop on the Use of Public Domain Software and
Software Standards", in particular the contributions of Jack Dongarra
and Sven Hammarling, conducted at the MHPCC.
- Workshop material developed by the Albuquerque Resource Center.
- Freeman, T. L. and Phillips, C., "Parallel Numerical Algorithms",
Prentice Hall, New York
- Golub, Gene and Ortega, James M., "Scientific Computing: An Introduction
with Parallel Computing", Academic Press, Boston
© Copyright 1995
Maui High Performance Computing Center. All rights reserved.
Documents located on the Maui High Performance Computing Center's WWW server
are copyrighted by the MHPCC. Educational institutions are encouraged to
reproduce and distribute these materials for educational use as long as
credit and notification are provided. Please retain this copyright notice
and include this statement with any copies that you make. Also, the MHPCC
requests that you send notification of their use to help@mail.mhpcc.edu.
Commercial use of these materials is prohibited without prior written
permission.
Revised: 03 July 1996 blaise@mhpcc.edu