-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory allocation #4665
Comments
It appears this buffer is allocated as a precaution in case the execution follows the single-threaded path. However, if OpenBLAS determines that multi-threading is necessary, the exec_threads function (see source) utilizes the buffers already allocated during the adjust_thread_buffers phase. In scenarios where the OpenBLAS call requires 16 buffers for execution, an additional buffer (making it 17) is unnecessarily allocated, resulting in the wastage of one buffer. @martin-frbg Another concern is that OpenBLAS allocates the number of buffers equal to the maximum possible threads per BLAS call, which is generally equivalent to the number of CPUs on the system. This approach is quite static and often leads to significant memory wastage, as many buffers remain unused during smaller BLAS calls. Moreover, this fixed allocation strategy imposes a limitation on scalability, making it challenging to support a higher |
Sometimes it's not just a matter of waste. #4662 fixed the When I tested Linpack with OpenBLAS on multiple NUMA nodes, enabling |
PR #4577
In
blas_thread_init
, memory is allocated forblas_cpu_number
threads using theadjust_thread_buffers
interface. However, when calling interfaces like gemm, memory allocation is still performed in the main thread:This would lead to an additional buffer being allocated, deviating from the logic of the code before the modification.
The text was updated successfully, but these errors were encountered: