-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Half factorization #1712
Half factorization #1712
Conversation
3db59fd
to
cd9677a
Compare
cd9677a
to
5e5cd03
Compare
5e5cd03
to
c276034
Compare
c276034
to
bbefde6
Compare
bbefde6
to
72d9d50
Compare
72d9d50
to
88967e6
Compare
88967e6
to
e667ec0
Compare
50ae4c1
to
bba40e0
Compare
e667ec0
to
c32201d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. I have a question regarding atomics and hip. The latest ROCm shows support for fp16 atomic operations: https://rocm.docs.amd.com/en/latest/reference/precision-support.html#atomic-operations-support, but TBH I can't figure out what operations exactly they mean with that. Did you try anything in that regard?
PairTypenameNameGenerator); | ||
|
||
|
||
TYPED_TEST(ParIlut, KernelThresholdSelectIsEquivalentToRef) | ||
{ | ||
using value_type = typename TestFixture::value_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of the tests here are missing SKIP_HALF
if compiling for HIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not support compute_l_u_factors in hip, but the others still works with half precision in HIP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your meaning now
cuda/solver/common_trs_kernels.cuh
Outdated
@@ -212,13 +212,15 @@ struct CudaSolveStruct : gko::solver::SolveStruct { | |||
|
|||
size_type work_size{}; | |||
|
|||
// TODO: In nullptr is considered nullptr_t not casted to const | |||
// it does not work in cuda110/100 images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// it does not work in cuda110/100 images | |
// Explicitly cast `nullptr` to `const ValueType*` to prevent compiler issues with cuda 10/11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more on the host compiler side because it goes through our binding first with specfic type
f758d41
to
4b54a9d
Compare
4b54a9d
to
f8a9b1c
Compare
f8a9b1c
to
29239b3
Compare
d66627a
to
b58712a
Compare
3a98c11
to
ac216bc
Compare
b58712a
to
53a1d80
Compare
… in shared memory
53a1d80
to
e0e42b0
Compare
Error: PR already merged! |
this pr adds the factorization with half support.
Hip does not support atomic on the 16bits type currently
NVHPC 23.3 seems to handle assignment index with optimization wrongly on a custom class when IndexType is long. We set the index explicitly with volatile to solve it. NVHPC24.1 seem to fixed this issue.
https://godbolt.org/z/srYhGndKn
TODO: