You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the suggestion
Add MPI awareness to Omniperf
Justification
Adding MPI awareness is something we've been meaning to address and is highly requested by users. If Omniperf is MPI aware we can also begin to implement some clever ways to reduce the computational load by distributed counter collection (multi gpu scenario).
We've been holding off on this because we wanted to do it right. This seems like an appropriate opportunity to tackle implementation.
Implementation
Brute force approach would be to run our (~14x app replays) on each node. The profiling side of this method is straightforward, but post-processing could introduce some issues.
Alternatively, we could split runs up across nodes, assuming the same kernels are being launched.
Additional Notes
A potential gotcha to consider is the number of MPI ranks we advertise as being supported. Launching hundreds of ranks introduces data processing difficulties due to the raw amount of data generated
Describe the suggestion
Add MPI awareness to Omniperf
Justification
Adding MPI awareness is something we've been meaning to address and is highly requested by users. If Omniperf is MPI aware we can also begin to implement some clever ways to reduce the computational load by distributed counter collection (multi gpu scenario).
We've been holding off on this because we wanted to do it right. This seems like an appropriate opportunity to tackle implementation.
Implementation
Additional Notes
Originally posted by @coleramos425 in #153 (comment)
The text was updated successfully, but these errors were encountered: