Add MPI awareness to Omniperf #170

coleramos425 · 2023-08-25T17:06:11Z

Describe the suggestion
Add MPI awareness to Omniperf

Justification
Adding MPI awareness is something we've been meaning to address and is highly requested by users. If Omniperf is MPI aware we can also begin to implement some clever ways to reduce the computational load by distributed counter collection (multi gpu scenario).

We've been holding off on this because we wanted to do it right. This seems like an appropriate opportunity to tackle implementation.

Implementation

Brute force approach would be to run our (~14x app replays) on each node. The profiling side of this method is straightforward, but post-processing could introduce some issues.
Alternatively, we could split runs up across nodes, assuming the same kernels are being launched.

Additional Notes

A potential gotcha to consider is the number of MPI ranks we advertise as being supported. Launching hundreds of ranks introduces data processing difficulties due to the raw amount of data generated

Originally posted by @coleramos425 in #153 (comment)

coleramos425 added the Omniperf Revamp Ticket related to the redesign of Omniperf label Aug 25, 2023

coleramos425 added this to the v1.3.0 milestone Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MPI awareness to Omniperf #170

Add MPI awareness to Omniperf #170

coleramos425 commented Aug 25, 2023

Add MPI awareness to Omniperf #170

Add MPI awareness to Omniperf #170

Comments

coleramos425 commented Aug 25, 2023