You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, there
In sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp, mma_promotion_interval=4 means will add current 4 MMA's result sum to the ultimate result. My question how could this non-default behavior could improve FP8 accuracy? And could you share some best practice on using this specialized implementation, like how to set mma_promotion_interval according to activation input range?
Thanks!
The text was updated successfully, but these errors were encountered:
By the way, all the collective mainloop specializations under include/cutlass/gemm/collective/ have the "mma_promotion_interval" member in Arguments, I understand this treatment makes uniform mainloop argument at host side possible. So the missing of mma_promotion_interval in below file is unexpected?
Hi, there
In sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp, mma_promotion_interval=4 means will add current 4 MMA's result sum to the ultimate result. My question how could this non-default behavior could improve FP8 accuracy? And could you share some best practice on using this specialized implementation, like how to set mma_promotion_interval according to activation input range?
Thanks!
The text was updated successfully, but these errors were encountered: