-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds distributed row gatherer #1589
base: neighborhood-communicator
Are you sure you want to change the base?
Conversation
6b4521b
to
ae60198
Compare
6acf7c4
to
8aa6ab9
Compare
49557f1
to
4a79442
Compare
8aa6ab9
to
77398bd
Compare
4a79442
to
172eb7d
Compare
77398bd
to
d278cad
Compare
98fa10a
to
79de4c3
Compare
One issue that I have is the constructor. It takes a
If I can't come up with anything better, I guess I will use that. |
79de4c3
to
b0e5c92
Compare
d278cad
to
d6112ef
Compare
b0e5c92
to
775854a
Compare
d6112ef
to
1582673
Compare
Do we need to have the |
0ad4ee8
to
1f49b91
Compare
8697971
to
341e781
Compare
1f49b91
to
4db050c
Compare
send_sizes.data(), send_offsets.data(), type, recv_ptr, | ||
recv_sizes.data(), recv_offsets.data(), type); | ||
coll_comm | ||
->i_all_to_all_v(use_host_buffer ? exec->get_master() : exec, send_ptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any difference between using all_to_all_v vs i_all_to_all_v? I assume all_to_all_v also update the interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_to_all_v
is a blocking call, while i_all_to_all_v
is non-blocking. Right now the collective_communicator only provides the non-blocking interface, since it is more general.
* auto x = matrix::Dense<double>::create(...); | ||
* | ||
* auto future = rg->apply_async(b, x); | ||
* // do some computation that doesn't modify b, or access x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it access x but it is unclear when it will be accessed before the wait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this just meant to say that you can't expect any meaningful data when accessing x
before the wait
has completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I get it wrong.
Is the comment here to describe that user can do something safely after the call or the apply_async behavior?
My comment was based on that it is the behavior of the apply_async because apply_async definitely accesses x.
If it is for user action during async and wait, then it is correct.
workspace.set_executor(mpi_exec); | ||
if (send_size_in_bytes > workspace.get_size()) { | ||
workspace.resize_and_reset(sizeof(ValueType) * | ||
send_size[0] * send_size[1]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
combining them to assign the workspace directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combine how? Do you mean like
workspace = array<char>(mpi_exec, sizeof(ValueType) * send_size[0] * send_size[1]);
req = coll_comm_->i_all_to_all_v( | ||
mpi_exec, send_ptr, type.get(), recv_ptr, type.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
send_buffer might be on the host but the recv_ptr(x_local) might be on the device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a check above to ensure that the memory space of the recv buffer is accessible from the mpi executor. So if GPU aware MPI is used, it should work (even if send buffer is on the host and recv buffer in the device or vice versa). Otherwise an exception will be thrown.
4db050c
to
1ebe59f
Compare
b2025a8
to
f77cb6c
Compare
f77cb6c
to
c827b23
Compare
1ebe59f
to
e7d32a1
Compare
c827b23
to
2a54c3e
Compare
1216932
to
ceb6f2e
Compare
2a54c3e
to
bd358fc
Compare
ceb6f2e
to
807118c
Compare
a52ba0d
to
08c1f4e
Compare
Signed-off-by: Marcel Koch <[email protected]>
Signed-off-by: Marcel Koch <[email protected]>
- only allocate if necessary - synchronize correct executor Co-authored-by: Pratik Nayak <[email protected]>
- split tests into core and backend part - fix formatting - fix openmpi pre 4.1.x macro Co-authored-by: Pratik Nayak <[email protected]> Co-authored-by: Yu-Hsiang M. Tsai <[email protected]> Signed-off-by: Marcel Koch <[email protected]>
6d548e6
to
cf55d8d
Compare
08c1f4e
to
b3cab68
Compare
This PR adds a distributed row gatherer. This operator essentially provides the communication required in our matrix apply.
Besides the normal apply (which is blocking), it also provides two asynchronous calls. One version has an additional
workspace
parameter which is used as send buffer. This version can be called multiple times without restrictions, if different workspaces are used for each call. The other version doesn't have a workspace parameter, and instead uses an internal buffer. As a consequence, this function can only be called a second time, if the request of the previous call has been waited on. Otherwise, this function will throw.This is the second part of splitting up #1546.
It also introduces some intermediate changes, which could be extracted out beforehand:
a type-erasedDenseCache
makingnow part of Use index_map in distributed::matrix #1544detail::run
easier to usePR Stack: