You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 14, 2024. It is now read-only.
Hello I am writing an FFT algorithm in OpenCL and have found a pretty nasty bug in the ROCm OpenCL implementation. The problem resolves around the following kernel it's l2 variable:
This kernel is launched using a simple global range of 1. So no parallelism at all, single CU, single SE, single wavefront. However, the above kernel produces incorrect results.
I know for sure this is an optimization bug as forcefully printing l2 during execution makes the kernel produce correct results. Furthermore, adding -cl-opt-disable to the build program options also resolves the issue!
Settings -WB, -simplifycfg-sink-common=0 as mentioned in the DarkTable issue does not resolve the issue. Setting the optimization to anything above -O0 will produce incorrect results.
I have attached a standalone project with an ard-ocl target for which the source can be found in the oclfft folder. Several test cases for ard-ocl are included in the tests folder which uses boost to provide a unit test framework. The FFT function shown in a previous comment on this issue is used but produces incorrect results when compared against FFTW. The kernel is launched sequentially I.E. with a dimension of 1. When the kernel code is run on the CPU instead of using ROCM and OpenCL the results are correct.
Hello I am writing an FFT algorithm in OpenCL and have found a pretty nasty bug in the ROCm OpenCL implementation. The problem resolves around the following kernel it's
l2
variable:This kernel is launched using a simple global range of 1. So no parallelism at all, single CU, single SE, single wavefront. However, the above kernel produces incorrect results.
I know for sure this is an optimization bug as forcefully printing
l2
during execution makes the kernel produce correct results. Furthermore, adding-cl-opt-disable
to the build program options also resolves the issue!Once again, this can not be due to concurrency issues as the kernel is launched with
Settings
-WB, -simplifycfg-sink-common=0
as mentioned in the DarkTable issue does not resolve the issue. Setting the optimization to anything above-O0
will produce incorrect results.Please also see: ROCm/ROCm-OpenCL-Runtime#115
I have attached a standalone project with an ard-ocl target for which the source can be found in the oclfft folder. Several test cases for ard-ocl are included in the tests folder which uses boost to provide a unit test framework. The FFT function shown in a previous comment on this issue is used but produces incorrect results when compared against FFTW. The kernel is launched sequentially I.E. with a dimension of 1. When the kernel code is run on the CPU instead of using ROCM and OpenCL the results are correct.
This standalone project allows to isolate the optimization bug and test if the output is correct or not.
perf-engineering-project-3d31331f3aa00dc5d800af6e2b2210fcf104234b.tar.gz
FFTW, boost and cmake are required to run the standalone app.
The text was updated successfully, but these errors were encountered: