[QST] When to use MainloopSm90TmaGmmaWarpSpecializedFP8? #2001

ginowu · 2024-12-19T08:46:00Z

Hi, there
In sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp, mma_promotion_interval=4 means will add current 4 MMA's result sum to the ultimate result. My question how could this non-default behavior could improve FP8 accuracy? And could you share some best practice on using this specialized implementation, like how to set mma_promotion_interval according to activation input range?

Thanks!

ginowu · 2024-12-19T09:10:15Z

By the way, all the collective mainloop specializations under include/cutlass/gemm/collective/ have the "mma_promotion_interval" member in Arguments, I understand this treatment makes uniform mainloop argument at host side possible. So the missing of mma_promotion_interval in below file is unexpected?

include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized.hpp

ginowu added ? - Needs Triage question Question labels Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] When to use MainloopSm90TmaGmmaWarpSpecializedFP8? #2001

[QST] When to use MainloopSm90TmaGmmaWarpSpecializedFP8? #2001

ginowu commented Dec 19, 2024

ginowu commented Dec 19, 2024

[QST] When to use MainloopSm90TmaGmmaWarpSpecializedFP8? #2001

[QST] When to use MainloopSm90TmaGmmaWarpSpecializedFP8? #2001

Comments

ginowu commented Dec 19, 2024

ginowu commented Dec 19, 2024