You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
danielhua23
changed the title
[QST] why kElementsPerAccess > 1only when A and B are interleaved layout in default_mma_sm80_core.h
[QST] why kElementsPerAccess > 1 is not permanent in default_mma_sm80_core.h
Dec 23, 2024
Yes, it is related to the alignment. can_implement in the kernel level chech these. Usually, we want simt kernel alignment to be 1 to match cublas behavior. As to tensor core kernels, we want alignment as big as possible as long as the problem size allowed.
What is your question?
Dear cutlass team,
I found recently that
kElementsPerAccess > 1
that will pass in ThreadMap is not permanent, and with some layout,kElementsPerAccess = 1
, just wondering the reason we designed like this. Is it because we can't promise the runtime input problem size(for example, input shape is [57,35]) is divisible by128/sizeof_bit<Element>
and we have to set the kElementsPerAccess = 1?the code is located in https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/threadblock/default_mma_core_sm80.h#L1864 whose kElementsPerAccess > 1 and https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/threadblock/default_mma_core_sm80.h#L2003 whose kElementsPerAccess = 1
Thanks a ton for your time!
The text was updated successfully, but these errors were encountered: