Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] why kElementsPerAccess > 1 is not permanent in default_mma_sm80_core.h #2011

Closed
danielhua23 opened this issue Dec 23, 2024 · 3 comments

Comments

@danielhua23
Copy link

danielhua23 commented Dec 23, 2024

What is your question?
Dear cutlass team,

I found recently that kElementsPerAccess > 1 that will pass in ThreadMap is not permanent, and with some layout, kElementsPerAccess = 1, just wondering the reason we designed like this. Is it because we can't promise the runtime input problem size(for example, input shape is [57,35]) is divisible by 128/sizeof_bit<Element> and we have to set the kElementsPerAccess = 1?
the code is located in https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/threadblock/default_mma_core_sm80.h#L1864 whose kElementsPerAccess > 1 and https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/threadblock/default_mma_core_sm80.h#L2003 whose kElementsPerAccess = 1

Thanks a ton for your time!

@danielhua23
Copy link
Author

cc @jackkosaian and @hwu36

@danielhua23 danielhua23 changed the title [QST] why kElementsPerAccess > 1only when A and B are interleaved layout in default_mma_sm80_core.h [QST] why kElementsPerAccess > 1 is not permanent in default_mma_sm80_core.h Dec 23, 2024
@hwu36
Copy link
Collaborator

hwu36 commented Dec 23, 2024

Yes, it is related to the alignment. can_implement in the kernel level chech these. Usually, we want simt kernel alignment to be 1 to match cublas behavior. As to tensor core kernels, we want alignment as big as possible as long as the problem size allowed.

@danielhua23
Copy link
Author

thanks to your answer, I got your point!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants