-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMake causes slow eigenvalue decomposition (dsyev, dspgv, etc.) #4931
Comments
Try adding the -DCMAKE_C_FLAGS="-O2" option when using CMake. |
@XiWeiGu This option may be useful, but still seems to be much slower when using CMake, comparing to that of For github action, time of dsyev/dsyevx/dspgv are not decreased, after adding On my computer, these functions are faster by about 20% when using CMake, but still 4x-5x times slower than that of |
As far as I can tell, all the build options available in |
Possible solution
See also #4931 (comment). I think it is somehow confusing, that |
@martin-frbg Reply of #4931 (comment)
For clarification, for benchmark on 16 cores @ Ryzen 7945HX, the OpenBLAS is compiled with option Perhaps you saw the code on
You are mostly correct. Performance turbulance is significant. |
More on #4931 (comment) @martin-frbg After some elementary profiling, computation bottleneck for dsyev/dsyevx/dspgv are indeed I found that using optimized fortran compiling flag will largely accelarate
@XiWeiGu Your suggestion is useful! (though in another way 😂) I tried to
Results updated and available from |
CMake has its own ideas about optimization levels - namely that the user should specify CMAKE_BUILD_TYPE (where "Release" corresponds to -O3 for all languages involved). MKL will usually be faster for LAPACK functions as OpenBLAS |
Ah that's a clear explanation and a good practice. Thanks! |
Glad to hear your issue has been resolved. |
Hi devs!
Problem description
When compiled by
cmake
(instead ofmake
), some eigenvalue decomposition functions can be extremely slow. Fordsyev
, this can be 6x times slower; fortest_dsyevd
, this can be 2x times slower.This problem can be easily resolved by using
make
instead ofcmake
, but it seems to be too confusing and suspicious for me. It may well be possible that I missed something on how to build OpenBLAS; and hope for any suggestions or thoughts on this.Timings Evidence
16 cores @ Ryzen 7945HX (Zen4)
Using pthreads for multithreading.
Both
cmake
andmake
have the sameopenblas_get_config
output.openblas_get_config
:OpenBLAS 0.3.28 NO_AFFINITY COOPERLAKE MAX_THREADS=16
Compile command:
cmake
:cmake .. -DBUILD_SHARED_LIBS=1 -DNO_AFFINITY=1
make
:make CC=gcc FC=gfortran NUM_THREADS=16
All problems are 2048 x 2048. For function
dspgvx
, we only need first 512 eigenvalues and eigenvectors; for other cases, all eigenvalues and eigenvectors are required.cmake
make
All results are available from https://github.com/ajz34/issue_openblas_dsyev/tree/fc75b82593224be7c7b0673991cce5b72d24be8a
To avoid environment variable pollution, I also tried on machine on github actions.
It seems to behave similar problem, where functions like dsyev
cmake
can be extremely slower than that ofmake
.Configuration for github actions uses
USE_OPENMP=1
, and OpenBLAS version is 0.3.27.https://github.com/ajz34/issue_openblas_dsyev/actions/runs/11269032870/job/31336861677
The text was updated successfully, but these errors were encountered: