Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] When to use MainloopSm90TmaGmmaWarpSpecializedFP8? #2001

Open
ginowu opened this issue Dec 19, 2024 · 1 comment
Open

[QST] When to use MainloopSm90TmaGmmaWarpSpecializedFP8? #2001

ginowu opened this issue Dec 19, 2024 · 1 comment

Comments

@ginowu
Copy link

ginowu commented Dec 19, 2024

Hi, there
In sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp, mma_promotion_interval=4 means will add current 4 MMA's result sum to the ultimate result. My question how could this non-default behavior could improve FP8 accuracy? And could you share some best practice on using this specialized implementation, like how to set mma_promotion_interval according to activation input range?

Thanks!

@ginowu
Copy link
Author

ginowu commented Dec 19, 2024

By the way, all the collective mainloop specializations under include/cutlass/gemm/collective/ have the "mma_promotion_interval" member in Arguments, I understand this treatment makes uniform mainloop argument at host side possible. So the missing of mma_promotion_interval in below file is unexpected?

include/cutlass/gemm/collective/sm90_mma_array_tma_gmma_ss_warpspecialized.hpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant