Half factorization #1712

yhmtsai · 2024-10-25T17:17:50Z

this pr adds the factorization with half support.

Hip does not support atomic on the 16bits type currently

NVHPC 23.3 seems to handle assignment index with optimization wrongly on a custom class when IndexType is long. We set the index explicitly with volatile to solve it. NVHPC24.1 seem to fixed this issue.
https://godbolt.org/z/srYhGndKn

TODO:

add the fix of tri solve with half

MarcelKoch

Generally LGTM. I have a question regarding atomics and hip. The latest ROCm shows support for fp16 atomic operations: https://rocm.docs.amd.com/en/latest/reference/precision-support.html#atomic-operations-support, but TBH I can't figure out what operations exactly they mean with that. Did you try anything in that regard?

MarcelKoch · 2024-11-11T11:37:35Z

test/factorization/par_ilut_kernels.cpp

                 PairTypenameNameGenerator);


 TYPED_TEST(ParIlut, KernelThresholdSelectIsEquivalentToRef)
 {
+    using value_type = typename TestFixture::value_type;


Many of the tests here are missing SKIP_HALF if compiling for HIP.

we do not support compute_l_u_factors in hip, but the others still works with half precision in HIP

I got your meaning now

MarcelKoch · 2024-11-11T14:45:16Z

cuda/solver/common_trs_kernels.cuh

@@ -212,13 +212,15 @@ struct CudaSolveStruct : gko::solver::SolveStruct {

        size_type work_size{};

+        // TODO: In nullptr is considered nullptr_t not casted to const
+        // it does not work in cuda110/100 images


nit:

Suggested change

// it does not work in cuda110/100 images

// Explicitly cast `nullptr` to `const ValueType*` to prevent compiler issues with cuda 10/11

I think it is more on the host compiler side because it goes through our binding first with specfic type

cuda/solver/common_trs_kernels.cuh

hip/components/memory.hip.hpp

reference/factorization/par_ilut_kernels.cpp

test/factorization/lu_kernels.cpp

… in shared memory

ginkgo-bot · 2024-12-03T01:37:33Z

Error: PR already merged!

yhmtsai added the 1:ST:WIP This PR is a work in progress. Not ready for review. label Oct 25, 2024

yhmtsai self-assigned this Oct 25, 2024

yhmtsai mentioned this pull request Oct 25, 2024

Half preconditioner, multigrid, log, and reorder #1713

Merged

yhmtsai force-pushed the half_factorization branch from 3db59fd to cd9677a Compare October 28, 2024 16:12

yhmtsai force-pushed the half_solver branch from e962cb2 to 9a15695 Compare October 28, 2024 16:12

yhmtsai force-pushed the half_factorization branch from cd9677a to 5e5cd03 Compare October 28, 2024 17:19

yhmtsai force-pushed the half_solver branch from 9a15695 to 1d7f1d1 Compare October 28, 2024 17:19

yhmtsai force-pushed the half_factorization branch from 5e5cd03 to c276034 Compare October 29, 2024 09:17

yhmtsai force-pushed the half_solver branch from 1d7f1d1 to 1038d78 Compare October 29, 2024 09:17

yhmtsai force-pushed the half_factorization branch from c276034 to bbefde6 Compare October 29, 2024 18:21

yhmtsai force-pushed the half_solver branch from 1038d78 to 1959026 Compare October 29, 2024 18:21

yhmtsai mentioned this pull request Oct 30, 2024

Half precision support #1257

Closed

12 tasks

yhmtsai added this to the Ginkgo 1.9.0 milestone Oct 30, 2024

yhmtsai force-pushed the half_solver branch from 1959026 to ac679c2 Compare November 4, 2024 14:24

yhmtsai force-pushed the half_factorization branch from bbefde6 to 72d9d50 Compare November 4, 2024 14:24

yhmtsai force-pushed the half_solver branch from ac679c2 to eda6a77 Compare November 4, 2024 18:15

yhmtsai force-pushed the half_factorization branch from 72d9d50 to 88967e6 Compare November 4, 2024 18:15

yhmtsai added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Nov 5, 2024

yhmtsai force-pushed the half_factorization branch from 88967e6 to e667ec0 Compare November 5, 2024 18:03

yhmtsai force-pushed the half_solver branch 2 times, most recently from 50ae4c1 to bba40e0 Compare November 7, 2024 14:40

yhmtsai force-pushed the half_factorization branch from e667ec0 to c32201d Compare November 7, 2024 14:40

MarcelKoch self-requested a review November 11, 2024 11:25

MarcelKoch requested changes Nov 11, 2024

View reviewed changes

yhmtsai force-pushed the half_solver branch from c7cc05e to a09d80a Compare November 27, 2024 18:42

yhmtsai force-pushed the half_factorization branch from f758d41 to 4b54a9d Compare November 27, 2024 18:42

yhmtsai force-pushed the half_solver branch from a09d80a to 8f64d67 Compare November 28, 2024 18:11

yhmtsai force-pushed the half_factorization branch from 4b54a9d to f8a9b1c Compare November 28, 2024 18:11

yhmtsai force-pushed the half_solver branch from 8f64d67 to d25a25a Compare November 28, 2024 19:53

yhmtsai force-pushed the half_factorization branch from f8a9b1c to 29239b3 Compare November 28, 2024 19:53

yhmtsai force-pushed the half_solver branch from d25a25a to f052a55 Compare November 29, 2024 15:22

yhmtsai force-pushed the half_factorization branch 2 times, most recently from d66627a to b58712a Compare November 30, 2024 01:30

yhmtsai force-pushed the half_solver branch 2 times, most recently from 3a98c11 to ac216bc Compare November 30, 2024 18:36

yhmtsai force-pushed the half_factorization branch from b58712a to 53a1d80 Compare November 30, 2024 18:36

yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. 1:ST:skip-full-test and removed 1:ST:ready-for-review This PR is ready for review labels Dec 3, 2024

yhmtsai force-pushed the half_solver branch from ac216bc to 6ad32f2 Compare December 3, 2024 01:17

Base automatically changed from half_solver to develop December 3, 2024 01:22

yhmtsai added 9 commits December 3, 2024 02:24

triangular and direct solver

c09aae4

workaround for half precision of load/store by using single precision…

6dfbec3

… in shared memory

delete the current unusable half memory op on shared memory

a2382ec

direct and tri config dispatch

aa3471b

factorization

66e0741

factorization config dispatch

ac32d0f

cmake cuda test with cuda arch and fix is_finite

8fda6c4

figure out factorization test

83fde1f

change the diagonal to reduce random on parilut/parict

e0e42b0

yhmtsai force-pushed the half_factorization branch from 53a1d80 to e0e42b0 Compare December 3, 2024 01:24

yhmtsai merged commit 304755d into develop Dec 3, 2024
7 of 11 checks passed

yhmtsai deleted the half_factorization branch December 3, 2024 01:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Half factorization #1712

Half factorization #1712

yhmtsai commented Oct 25, 2024 •

edited

Loading

MarcelKoch left a comment

MarcelKoch Nov 11, 2024

yhmtsai Nov 12, 2024

yhmtsai Nov 14, 2024

MarcelKoch Nov 11, 2024

yhmtsai Nov 12, 2024

ginkgo-bot commented Dec 3, 2024

	// it does not work in cuda110/100 images
	// Explicitly cast `nullptr` to `const ValueType*` to prevent compiler issues with cuda 10/11

Half factorization #1712

Half factorization #1712

Conversation

yhmtsai commented Oct 25, 2024 • edited Loading

MarcelKoch left a comment

Choose a reason for hiding this comment

MarcelKoch Nov 11, 2024

Choose a reason for hiding this comment

yhmtsai Nov 12, 2024

Choose a reason for hiding this comment

yhmtsai Nov 14, 2024

Choose a reason for hiding this comment

MarcelKoch Nov 11, 2024

Choose a reason for hiding this comment

yhmtsai Nov 12, 2024

Choose a reason for hiding this comment

ginkgo-bot commented Dec 3, 2024

yhmtsai commented Oct 25, 2024 •

edited

Loading