You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Model I/O is slow, and is especially noticeable when dealing with ensemble output from LETKF. This is due to the fact that all state I/O is currently being done in serial, on PE 0. ... not the most efficient way.
Solution
I/O needs to be parallelized. There are 2 ways this could be done
Use the tiled I/O capabilities of FMS. (This is not preferable because files in the GDAS workflow are not tiled, and I doubt @guillaumevernieres wants to put in mppnccombine executable calls in the workflow for each output file)
Otherwise, FMS does not have parallel I/O capabilities, we'll need to use direct netcdf and mpi scatter/gather calls.
Assuming we use our own netcdf calls for soca I/O there are several things of varying complexity/craziness we could try
parallel netcdf where each PE or a pool of PEs does the I/O. This should perform well, if tuned correctly. Also, I'm not a fan of parallel netcdf I/O because it works best only if the file chunking and underlying filesystem is setup correctly, which they rarely are
serial netcdf, but we do some crazy asynchronous stuff in the background. Instead of all IO taking place on PE 0 (as is currently done), the file IO is done round-robin on different PEs. This by itself wouldn't give any speedup, but the state read or write could be done asynchronously, with locks placed around all other state functions that wait for pending asynchronous I/O to finish. The appealing thing about this is that no change would be needed to oops to allow for ensemble I/O done with one file per PE. Also, I have experience implementing this with past versions of LETKF I've worked on.
Add a "parallel ensemble state read/write" set of functions to oops so that a model interface could handle its own mpi scatter/gather then do one-file-per-PE I/O in parallel. I don't feel like touching oops, so no.
I'll probably go with number 2, but we'll see.
The text was updated successfully, but these errors were encountered:
Description
Model I/O is slow, and is especially noticeable when dealing with ensemble output from LETKF. This is due to the fact that all state I/O is currently being done in serial, on PE 0. ... not the most efficient way.
Solution
I/O needs to be parallelized. There are 2 ways this could be done
mppnccombine
executable calls in the workflow for each output file)Assuming we use our own netcdf calls for soca I/O there are several things of varying complexity/craziness we could try
I'll probably go with number 2, but we'll see.
The text was updated successfully, but these errors were encountered: