Readme for SuperLUbench

Table of Contents

SuperLUbench
TODOS
Reproducible Benchmarks on Taurus
References

SuperLUbench

Benchmark the libraries SuperLU, SuperLU_MT and SuperLU_DIST [1] for solving large sparse linear systems of form Ax=b.

SuperLU is used in many applications around the software world. The search on comparisons or benchmarking with respect to the three flavors of SuperLU gave nothing. Thus, I built a benchmark for comparison up on the examples that are shipped with SuperLU. This code might help to decide whether to use SuperLU, SuperLU_MT or SuperLU_Dist to solve a given linear system (or class of systems) on a specific system. Important metrics for this decision are memory footprint and time to solution.

On SuperLU

For those readers which are not familiar with SuperLU, here is a quote on SuperLU from its author Xiaoye Sherry Li:

"SuperLU is a general purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations. The library is written in C and is callable from either C or Fortran program. It uses MPI, OpenMP and CUDA to support various forms of parallelism. It supports both real and complex datatypes, both single and double precision, and 64-bit integer indexing. The library routines performs an LU decomposition with partial pivoting and triangular system solves through forward and back substitution. The LU factorization routines can handle non-square matrices but the triangular solves are performed only for square matrices. The matrix columns may be preordered (before factorization) either through library or user supplied routines.[…]" [1]

Table 1. General View on SuperLU

Library	Target Architecture	Parallelization
SuperLU	Sequential Systems	-
SuperLU_MT	Shared-Memory Systems	OpenMP or PThreads
SuperLU_DIST	Highly Parallel Distributed-Memory Systems and hybrid Systems with GPUs (not covered here)s	MPI+OpenMP+CUDA

More detailed information can be found on the website of SuperLU and within the Users' Guide.

Benchmarks

The three benchmarks and the used driver routines to solve the system Ax=b are given in the following table.

Table 2. Benchmarks and Driver Routines

Library	Benchmark	Driver Routine
SuperLU	dlinsolx	`dgssvx()`
SuperLU_MT	pdlinsolx	`pdgssvx()`
SuperLU_DIST	pddrive_ABglobal	`pdgssvx_ABglobal()`

The benchmarks are build up on examples shipped with SuperLU, SuperLU_MT and SuperLU_DIST, respectively. The basic steps are:

Parse command line arguments
Read in linear system and known solution from files
Allocate memory and create SuperLU data structures
For i = 1, …, R
- Start timing
- Compute solution of Ax=b (LU factorization and triangular solve)
- Finish timing
- Compare solution with known solution
- Output statistics
Clean up, i.e, free memory

Configure and Make

Building the three benchmarks of SuperLUbench is quite easy:

Step: Adopt the settings in config.inc to the specific needs and system.
Step: Call make all from top level directory to compile all three benchmarks in one rush.

Moreover, there is a Makefile for each benchmark in the corresponding directory (src/SuperLU*). All Makefiles depend on the values defined in config.inc.

Usage

Examples

Once build, the benchmarks can be feed with the provided linear system in the example directory.

SuperLU

cd src/SuperLU/
./dlinsolx -A ../../example/A.mtx -b ../../example/b.mtx -x ../../example/x.mtx

SuperLU_MT

cd src/SuperLU_MT/
export OMP_NUM_THREADS=2
./pdlinsolx -p 2 -A ../../example/A.mtx -b ../../example/b.mtx -x ../../example/x.mtx

SuperLU_DIST

cd src/SuperLU_DIST/
export OMP_NUM_THREADS=1
mpirun -np 2 ./pddrive_ABglobal -r 2 -c 1 -A ../../example/A.mtx -b ../../example/b.mtx -x ../../example/x.mtx

For further information on controlling the parallelism, we refer to section Control Parallelism.

Arguments and Options

All three benchmarks share these command line arguments:

-A  <FILE>     Matrix A
-b  <FILE>     Right hand side vector b
-x  <FILE>     Known solution vector x to verify the computed result
-R  <int>      Number of repetitions to solve Ax=b
-v             Print additional information and statistics

The parallelism of SuperLU_MT is controlled via:

-p  <int>      Number of threads to use

If SuperLU_MT was built with OpenMP support, it is necessary to set the environment variable OMP_NUM_THREADS equal to the value of -p!

The parallelism of SuperLU_DIST is controlled via:

-r  <int>      Process rows
-c  <int>      Process columns

These two values control the partitioning of the matrices among the MPI processes and must be provided by the user. See section Control Parallelism for more information on this options.

Input Formats

Although there are multiple popular matrix formats (e.g., Harwell-Boeing, Triplet), only the Matrix Market format [2] is supported at the moment.

Control Parallelism

SuperLU_MT: The number of threads N used by the solver method pdgssvx() is controlled via the command line argument -p N. This holds for both cases: Whether SuperLU_MT library was build with support of PThreads or OpenMP. Oversubscription, i.e., number of threads > available physical CPUs, can be (but not necessarly needs to be) a drawback with respect to performance. The optimal number of threads with respect to time to solution depends on the specific linear system and the computing system.

SuperLU_DIST: Since SuperLU_DIST uses MPI, the benchmark pddrive_ABglobal needs to be invoked with mpirun -np NP, where NP is the number of MPI processes to use. Furthermore, the matrix A is decomposed in a block fashion way. Todo Explain this and what is -r and -c. Additionally, it might by crucial for performance to specify the environment variable OMP_NUM_THREADS, since SuperLU_DIST can also make use of threading. Oversubscription, i.e., sum of MPI processes and threads > available physical CPUs, can be a drawback with respect to performance! The default value for OMP_NUM_THREADS is implementation depend (, e.g., libgomp: one thread per CPU is used).

SuperLU FAQ gives the following advice in order to choose values for -r and -c: "For best performance, the process grid should be arranged as square as possible. When square grid is not possible, it is better to set the row dimension (nprow) slightly smaller than the column dimension (npcol). For example, the following are good options: 2x3, 2x4, 4x4, 4x8, etc." [3]

Collection of Linear Systems

The repository SuperLUbench-Collection [4] provides a collection of sparse linear system for benchmarking purposes.

Additional Information for Developers

There are some thoughts I want to dump for future references:

The used functions dgssvx(), pdgssvx() and pdgssvx_ABglobal() within SuperLUbench may overwrite the structures holding A and b. Thus, the values are stored in additional arrays and restored before iteratively calling the solve functions.

In SuperLU and SuperLU_MT it seems to be sufficient to recreate the arrays double *a and double *rhsb.

Additional to this two arrays, the array asub (holding the row indices) needs to be recreated in SuperLU_DIST since pdgssvx_ABglobal() may overwrite it. I’m not sure, if this also holds for the methods dgssvx() and pdgssvx(), respectively. Since I have not seen any issues with that, I did not investigate further.

TODOS

Some functions of SuperLU, SuperLU_MT and SuperLU_DIST output information via printf to stdout and stderr. For instance, the methods dinf_norm_error() and dinf_norm_error_dist() print the norm value directly to stdout. From my point of view, the better way would be to return the norm value so that it can be used for further processing.

Reproducible Benchmarks on Taurus

This is the interactive way:

// Go to Haswell node
$ srun --pty --nodes 1 --ntasks 1 --cpus-per-task 12 --time=02:00:00 --mem-per-cpu=2500 -p interactive --account=p_paradom zsh
$ cd /scratch/mflehmig/PARADOM/

// Clone Repositories
$ git clone https://github.com/mflehmig/SuperLUbench-Collection.git
$ git clone https://github.com/mflehmig/SuperLUbench.git
$ cd SuperLUbench/
$ ln -s ../SuperLUBench-Collection mtx

$ source load_modules.sh
$ ln -s config_taurus.inc config.inc
$ make -j

References

[1] Xiaoye Sherry Li, Website of SuperLU
[2] US National Institute of Standards and Technology, Matrix Market Format
[3] Xiaoye Sherry Li, SuperLU FAQ
[4] Martin Schroschk, SuperLUbench-Collection

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
doc		doc
example		example
include		include
src		src
.gitignore		.gitignore
CMakeCache.txt		CMakeCache.txt
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.adoc		README.adoc
cmake_install.cmake		cmake_install.cmake
config.inc		config.inc
config_taurus.inc		config_taurus.inc
link.sh		link.sh
load_modules.sh		load_modules.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Readme for SuperLUbench

SuperLUbench

On SuperLU

Benchmarks

Configure and Make

Usage

Examples

Arguments and Options

Input Formats

Control Parallelism

Collection of Linear Systems

Additional Information for Developers

TODOS

Reproducible Benchmarks on Taurus

References

About

Releases

Packages

Languages

License

mflehmig/SuperLUbench

Folders and files

Latest commit

History

Repository files navigation

Readme for SuperLUbench

SuperLUbench

On SuperLU

Benchmarks

Configure and Make

Usage

Examples

Arguments and Options

Input Formats

Control Parallelism

Collection of Linear Systems

Additional Information for Developers

TODOS

Reproducible Benchmarks on Taurus

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages