-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performance of function dsyev
and dsyevx
(not fully paralleled)
#4758
Comments
The LAPACK included in OpenBLAS is almost completely copied from the reference implementation, also known as "netlib" LAPACK https://github.com/Reference-LAPACK/lapack - which is not optimized for speed (and not parallelized except for a few functions that can use OpenMP parallelism if available). Only a handful of functions (such as getrf/potrf) have been reimplemented in OpenBLAS, for everything else the only performance advantage over the reference implementation comes from using the optimized BLAS functions. |
@martin-frbg |
Well, it might be a useful coincidence if the bottleneck in DSYEV turned out to be the (D)LASR function too, but I have not checked. |
To the best of my knowledge, higher-level packages such as numpy try to use SYEVD. As you note, it requires a lot of memory (quadratic to be precise), but it relies on BLAS3 building blocks, which OpenBLAS optimizes and parallelizes. LAPACK uses the X suffix to denote expert drivers. I don't know if all routines follow the convention, but at least all I can think of. So SYEVX is the expert driver for SYEV. SYEV indeed uses LASR under the hood, which, if implemented as in reference-LAPACK, has terrible performance. |
@angsch That's correct and helpful. Numpy uses SYEVD as default backend (related code). Scipy uses different default backend that SYEVR for eigen and SYGVD for generalized eigen (related code). |
Nice, I didn't know that Scipy uses SYEVR.
I see that it is difficult to pick a routine if you do not want to read up on the algorithms. However, in my opinion, the algorithms in LAPACK are complementary. The main differentiators are speed, workspace, and accuracy. There is not a single best algorithm for all symmetric eigenvalue problems. To perhaps conclude this discussion, let me add my two cents on the statement in the very first comment
I strongly oppose that OpenBLAS should redirect calls to SYEV to either SYEVR or SYEVD. If you call SYEV you should get what the name SYEV promises. This is why we have LAPACK as the open source reference. One great thing about LAPACK as a standard is that no matter what package you link against (reference-LAPACK, OpenBLAS, vendor-provided packages like MKL), you know what you will get. It's great for portability. The only existing exception in LAPACK are wrapper routines that explicitly state that you should not make any assumptions on what routine is used under the hood to give optimized libraries more freedom. GEQR is an example. Problems where algorithms deliver different accuracy or could even fail do not really fit into this concept. I consider eigenproblems to fall into this category. |
|
Hello developers!
I found that functions
dsyev
anddsyevx
seems not fully paralleled, whenTARGET=ZEN USE_64BITINT=1 DYNAMIC_ARCH=1 NO_CBLAS=0 NO_LAPACK=0 NO_LAPACKE=0 NO_AFFINITY=1 USE_OPENMP=1
Preliminary testing on Intel CPU may also show similar problem.
I'm not sure whether if it's the problem of make configurations, or OpenBLAS currently not fully implemented parallel version of
dsyev
anddsyevx
.Hope to hear any thoughts or advices, and thanks in advance!
I guess that
dsyevr
anddsyevd
could be better replacements todsyev
.dsyevd
is the fastest but consumes more memory, whiledsyevr
uses much smaller temporary memory.So additionally, as a programmer not very familiar to low-level BLAS/LAPACK, I wonder that if it's common to use
dsyevr
anddsyevd
as eigen-solvers, instead ofdsyev
? If so, this may not be such important issue.Benchmark results (16 cores @ Ryzen 7945HX)
dsyev
dsyevd
dsyevr
dsyevx
Reproduction of this issue can be found in Github Action CI (2 physical cores @ EPYC 7763 of github action)(https://github.com/ajz34/issue_openblas_dsyev/actions/runs/9578584638/job/26409147609).
For scripts used in 16 cores @ Ryzen 7945HX, also see https://github.com/ajz34/issue_openblas_dsyev/tree/16-cores-Ryzen-7945HX.
The text was updated successfully, but these errors were encountered: