Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely redo model I/O to parallelize it #1125

Open
travissluka opened this issue Jan 21, 2025 · 0 comments
Open

Completely redo model I/O to parallelize it #1125

travissluka opened this issue Jan 21, 2025 · 0 comments
Assignees
Labels
SOCA Sea-ice, Ocean, and Coupled Assimilation

Comments

@travissluka
Copy link
Collaborator

Description

Model I/O is slow, and is especially noticeable when dealing with ensemble output from LETKF. This is due to the fact that all state I/O is currently being done in serial, on PE 0. ... not the most efficient way.

Solution

I/O needs to be parallelized. There are 2 ways this could be done

  1. Use the tiled I/O capabilities of FMS. (This is not preferable because files in the GDAS workflow are not tiled, and I doubt @guillaumevernieres wants to put in mppnccombine executable calls in the workflow for each output file)
  2. Otherwise, FMS does not have parallel I/O capabilities, we'll need to use direct netcdf and mpi scatter/gather calls.

Assuming we use our own netcdf calls for soca I/O there are several things of varying complexity/craziness we could try

  1. parallel netcdf where each PE or a pool of PEs does the I/O. This should perform well, if tuned correctly. Also, I'm not a fan of parallel netcdf I/O because it works best only if the file chunking and underlying filesystem is setup correctly, which they rarely are
  2. serial netcdf, but we do some crazy asynchronous stuff in the background. Instead of all IO taking place on PE 0 (as is currently done), the file IO is done round-robin on different PEs. This by itself wouldn't give any speedup, but the state read or write could be done asynchronously, with locks placed around all other state functions that wait for pending asynchronous I/O to finish. The appealing thing about this is that no change would be needed to oops to allow for ensemble I/O done with one file per PE. Also, I have experience implementing this with past versions of LETKF I've worked on.
  3. Add a "parallel ensemble state read/write" set of functions to oops so that a model interface could handle its own mpi scatter/gather then do one-file-per-PE I/O in parallel. I don't feel like touching oops, so no.

I'll probably go with number 2, but we'll see.

@travissluka travissluka added the SOCA Sea-ice, Ocean, and Coupled Assimilation label Jan 21, 2025
@travissluka travissluka self-assigned this Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SOCA Sea-ice, Ocean, and Coupled Assimilation
Projects
None yet
Development

No branches or pull requests

1 participant