Completely redo model I/O to parallelize it #1125

travissluka · 2025-01-21T19:39:29Z

Description

Model I/O is slow, and is especially noticeable when dealing with ensemble output from LETKF. This is due to the fact that all state I/O is currently being done in serial, on PE 0. ... not the most efficient way.

Solution

I/O needs to be parallelized. There are 2 ways this could be done

Use the tiled I/O capabilities of FMS. (This is not preferable because files in the GDAS workflow are not tiled, and I doubt @guillaumevernieres wants to put in mppnccombine executable calls in the workflow for each output file)
Otherwise, FMS does not have parallel I/O capabilities, we'll need to use direct netcdf and mpi scatter/gather calls.

Assuming we use our own netcdf calls for soca I/O there are several things of varying complexity/craziness we could try

parallel netcdf where each PE or a pool of PEs does the I/O. This should perform well, if tuned correctly. Also, I'm not a fan of parallel netcdf I/O because it works best only if the file chunking and underlying filesystem is setup correctly, which they rarely are
serial netcdf, but we do some crazy asynchronous stuff in the background. Instead of all IO taking place on PE 0 (as is currently done), the file IO is done round-robin on different PEs. This by itself wouldn't give any speedup, but the state read or write could be done asynchronously, with locks placed around all other state functions that wait for pending asynchronous I/O to finish. The appealing thing about this is that no change would be needed to oops to allow for ensemble I/O done with one file per PE. Also, I have experience implementing this with past versions of LETKF I've worked on.
Add a "parallel ensemble state read/write" set of functions to oops so that a model interface could handle its own mpi scatter/gather then do one-file-per-PE I/O in parallel. I don't feel like touching oops, so no.

I'll probably go with number 2, but we'll see.

The text was updated successfully, but these errors were encountered:

travissluka added the SOCA Sea-ice, Ocean, and Coupled Assimilation label Jan 21, 2025

travissluka self-assigned this Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completely redo model I/O to parallelize it #1125

Completely redo model I/O to parallelize it #1125

travissluka commented Jan 21, 2025

Completely redo model I/O to parallelize it #1125

Completely redo model I/O to parallelize it #1125

Comments

travissluka commented Jan 21, 2025

Description

Solution