Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Two questions about warp specialized ping-pong kernel #1992

Closed
danielhua23 opened this issue Dec 17, 2024 · 3 comments
Closed

[QST] Two questions about warp specialized ping-pong kernel #1992

danielhua23 opened this issue Dec 17, 2024 · 3 comments

Comments

@danielhua23
Copy link

danielhua23 commented Dec 17, 2024

What is your question?
Dear cutlass team,
I wanna consult two questions when I was studying warp specialized ping-pong kernel, below is the screenshot of “Speaking tensor cores" slides,just as what I have circled and framed,I have understood why we should do like it. But my questions are:

  1. each Tensor Core Ops always started before each TMA load was done, which part of code implemented it? I guess it is
auto barrier_token = pipeline.consumer_try_wait(smem_pipe_read);
pipeline.consumer_wait(smem_pipe_read, barrier_token);
int read_stage = smem_pipe_read.index();
warpgroup_arrive();

If not, Could you pls point me to the right code?

  1. as the pic, data of TMA loads are alternately fed into consumer1 and consumer2, how is it controlled to different consumer? by mainloop_pipeline_consumer_state of each consumer warp group?

Thanks a ton for your time!
Image

@danielhua23
Copy link
Author

cc @jackkosaian @thakkarV

@thakkarV
Copy link
Collaborator

Yes and yes

@danielhua23
Copy link
Author

thanks~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants