You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is your question?
Dear cutlass team,
I wanna consult two questions when I was studying warp specialized ping-pong kernel, below is the screenshot of “Speaking tensor cores" slides,just as what I have circled and framed,I have understood why we should do like it. But my questions are:
each Tensor Core Ops always started before each TMA load was done, which part of code implemented it? I guess it is
auto barrier_token = pipeline.consumer_try_wait(smem_pipe_read);
pipeline.consumer_wait(smem_pipe_read, barrier_token);
int read_stage = smem_pipe_read.index();
warpgroup_arrive();
If not, Could you pls point me to the right code?
as the pic, data of TMA loads are alternately fed into consumer1 and consumer2, how is it controlled to different consumer? by mainloop_pipeline_consumer_state of each consumer warp group?
Thanks a ton for your time!
The text was updated successfully, but these errors were encountered:
What is your question?
Dear cutlass team,
I wanna consult two questions when I was studying warp specialized ping-pong kernel, below is the screenshot of “Speaking tensor cores" slides,just as what I have circled and framed,I have understood why we should do like it. But my questions are:
If not, Could you pls point me to the right code?
Thanks a ton for your time!
The text was updated successfully, but these errors were encountered: