-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplementation of bridge/multifile feature for data_override in main #1408
Reimplementation of bridge/multifile feature for data_override in main #1408
Conversation
- Raphael Dussin implementation of bridge/multifile feature for data_override - This commit is basically Raphs branch raphaeldussin:feature_data_override_multifile merged with the main branch of FMS in PR NOAA-GFDL#1214 - Due to restructure of time_interp and data_override dirs in FMS2023.03 I could not (did not know how to) do this merge via git and had to extract the mods from that PR and apply them to the main branch. - Does not change answers.
@raphaeldussin this is what I have for the "merge" of your branch to main. I made this PR just so you and others could review what is involved, how you use it is up to you. |
This PR did not have a merge conflict yesterday! Is the conflict because of the keywords change for "data" ? |
thanks a lot @nikizadehgfdl ! I'll have a look tonight. The "data" keyword was one of the requested changes in my PR since it is also a Fortran construct. |
@raphaeldussin Thanks for implementing this and thanks @nikizadehgfdl for solving the merge conflicts (sorry you have to do it again) The code updates look good to me. |
Regarding the performance at initialization, I found no extra cost. I've shared with @nikizadehgfdl my xml and job output so he could confirm my findings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have missed it, but Is it required that all files in the multi-file list have the same spatial dimensions? If it is required, is there a check to ensure?
data_entry%multifile = .true. | ||
index_1col = index(data_entry%file_name, ":") | ||
index_2col = index(data_entry%file_name(index_1col+1:), ":") | ||
if (index_2col .eq. 0) call mpp_error(FATAL, "data_override: must provide either 1 or 3 forcing files") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we limited to 1 or 3 files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files poses the question of "should it go before or after?". Also most of the time we need to provide a duplicate of the very first record of the timeserie for time interpolation (unless first record falls exactly at midnight of Jan 1st) at the beginning of the run (and similar for the end) so we always end up padding at the beginning and end of all years
data_entry%multifile = .true. | ||
index_1col = index(data_entry%file_name, ":") | ||
index_2col = index(data_entry%file_name(index_1col+1:), ":") | ||
data_entry%prev_file_name = data_entry%file_name(1:index_1col-1) | ||
data_entry%next_file_name = data_entry%file_name(index_1col+index_2col+1:) | ||
! once previous/next files are filled in, overwrite current | ||
data_entry%file_name = data_entry%file_name(index_1col+1:index_1col+index_2col-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other code blocks for parsing the multiple files has a test for 1 or 3 forcing files. Does this code block need the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could test for 2 occurences of the :
separator between files to assure there are either 1 or 3 files but not 2. This would bulletproof the code I agree, good catch!
index_1col = index(data_entry%file_name, ":") | ||
index_2col = index(data_entry%file_name(index_1col+1:), ":") | ||
data_entry%multifile = .true. | ||
if (index_2col .eq. 0) call mpp_error(FATAL, "data_override: must provide either 1 or 3 forcing files") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above comment regarding the number of files allowed.
I looked at the clocks for the same 1 month ocean-ice experiment forced with JRA data with and without the multi-file-bridge data_override (apples to apples). The Ocean and Ice performance were the same. The nice surprise was that the data-override/flux-exchange actually benefited from the multi-file scheme and the whole model sped up by ~6%.
I don't fully understand this since the test was done for the month of October which only one forcing file is involved. |
- githubCI complains: 1143 | #include "time_interp_external2_bridge_r8.fh" Fatal Error: time_interp_external2_bridge_r4.fh: No such file or directory
I am done with the merging and fixing the CI tests. The one that fails (Test coupler build) has nothing to do with this PR. |
@nikizadehgfdl many thanks for this! What is the preferred method here? I can work off your branch and PR to it if that's ok with you? Then this PR can be merged in main and we abandon my original PR. sounds good to you? |
…ile_updated revised PR, adresses comments
@raphaeldussin made the mods per reviews of PR #1214 which is going to be closed and replaced by this one. @bensonr @thomas-robinson @rem1776 , please see if your concerns have been addressed. Thanks. |
@raphaeldussin I implemented this in the yaml version of the data table and updated the documentation in nikizadehgfdl#3 Let me know if you are okay with it or would like any changes. I think this is the last thing left to do. |
multi file data override yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Any new variables need to be documented with doxygen-style comments
- There are a lot of new big
if
blocks in data_override.inc. I stopped adding the comment that they should be documented and labeled because there are so many. Please label the large if blocks to make reading the code easier in the future and add comments to explain what the if block is doing. I think it would also be helpful to add to the function/subroutine descriptions - Is it possible to put the
if
blocks into a function and then call the function? Are these if blocks copy-and-pastes? They seem similar, and a function or subroutine would help reduce the size of this code. - I may have missed it, but I don't think that the term "bridge" is ever defined or discussed anywhere in the code documentation. "Bridge" is used extensively through the documentation, so I think there needs to be a clear, longer explaination of what is happening and what "bridge" means.
character(len=512) :: prev_file_name !< name of netCDF data file for previous segment | ||
character(len=512) :: next_file_name !< name of netCDF data file for next segment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be allocatable
? If they aren't always used, then this is a waste of memory.
type(time_type) :: first_record, last_record | ||
character(len=512) :: filename !< file containing source data | ||
character(len=512) :: prevfilename !< file containing source data for bridge | ||
character(len=512) :: nextfilename !< file containing source data for bridge | ||
character(len=128) :: fieldname !< fieldname used in the data file | ||
integer :: index1 !< field index in data_table | ||
integer :: dims(4), prev_dims(4), next_dims(4) | ||
integer :: id_time !< index for time interp in override array | ||
integer :: id_time_prev=-1 !index for time interp in override array | ||
integer :: id_time_next=-1 !index for time interp in override array | ||
integer :: curr_position !< position of the field currently processed in override_array | ||
integer :: i | ||
real(FMS_DATA_OVERRIDE_KIND_) :: factor | ||
logical :: multifile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these new variables need doxygen documentation
@@ -695,8 +781,66 @@ subroutine DATA_OVERRIDE_0D_(gridname,fieldname_code,data_out,time,override,data | |||
id_time = override_array(curr_position)%t_index | |||
endif !if curr_position < 0 | |||
|
|||
|
|||
if (multifile) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
-block should have documentation
@@ -695,8 +781,66 @@ subroutine DATA_OVERRIDE_0D_(gridname,fieldname_code,data_out,time,override,data | |||
id_time = override_array(curr_position)%t_index | |||
endif !if curr_position < 0 | |||
|
|||
|
|||
if (multifile) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Labeling this if block would make the code easier to read
first_record = data_table(index1)%time_records(1) | ||
last_record = data_table(index1)%time_records(dims(4)) | ||
|
||
if (multifile) then ! bridging between files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
labeling this if block would make the code easier to read
is_in=is_in,ie_in=ie_in,js_in=js_in,je_in=je_in,window_id=window_id) | ||
return_data = return_data*factor | ||
|
||
if (multifile) then ! bridging between files, see previous blocks for comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if statement should be labeled and documented
mask_out =mask_out, & | ||
is_in=is_in,ie_in=ie_in,js_in=js_in,je_in=je_in,window_id=window_id) | ||
|
||
if (multifile) then ! bridging between files, see previous blocks for comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if statement should be labeled and documented
integer, intent(in) :: index1, index2 | ||
type(time_type), intent(in) :: time | ||
real(FMS_TI_KIND_), dimension(:,:), intent(inout) :: data_in | ||
integer, intent(in), optional :: interp | ||
logical, intent(in), optional :: verbose | ||
type(horiz_interp_type),intent(in), optional :: horz_interp | ||
logical, dimension(:,:), intent(out), optional :: mask_out !< set to true where output data is valid | ||
integer, intent(in), optional :: is_in, ie_in, js_in, je_in | ||
integer, intent(in), optional :: window_id | ||
|
||
real(FMS_TI_KIND_), dimension(size(data_in,1), size(data_in,2), 1) :: data_out | ||
logical, dimension(size(data_in,1), size(data_in,2), 1) :: mask3d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new subroutine, so these variables should all be documented in doxygen-style
integer, intent(in) :: index1,index2 !< index of external field from previous call | ||
!! to init_external_field | ||
type(time_type), intent(in) :: time !< target time for data | ||
real(FMS_TI_KIND_), dimension(:,:,:), intent(inout) :: time_data !< global or local data array | ||
integer, intent(in), optional :: interp | ||
logical, intent(in), optional :: verbose !< flag for debugging | ||
type(horiz_interp_type), intent(in), optional :: horz_interp | ||
logical, dimension(:,:,:), intent(out), optional :: mask_out !< set to true where output data is valid | ||
integer, intent(in), optional :: is_in, ie_in, js_in, je_in | ||
integer, intent(in), optional :: window_id | ||
|
||
type(time_type) :: time1, time2 | ||
integer :: dims1(4), dims2(4) | ||
integer :: first_rec, last_rec | ||
integer :: nx, ny, nz, interp_method, t1, t2 | ||
integer :: i1, i2, isc, iec, jsc, jec, mod_time | ||
integer :: isc1, iec1, jsc1, jec1, isc2, iec2, jsc2, jec2 | ||
integer :: yy, mm, dd, hh, min, ss | ||
character(len=256) :: err_msg, filename | ||
|
||
integer :: isw, iew, jsw, jew, nxw, nyw | ||
! these are boundaries of the updated portion of the "data" argument | ||
! they are calculated using sizes of the "data" and isc,iec,jsc,jsc | ||
! fileds from respective input field, to center the updated portion | ||
! in the output array | ||
|
||
real(FMS_TI_KIND_) :: w1,w2 | ||
logical :: verb | ||
character(len=16) :: message1, message2 | ||
integer, parameter :: kindl = FMS_TI_KIND_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These variables need doxygen-style documentation
integer, intent(in) :: index1, index2 | ||
type(time_type), intent(in) :: time | ||
real(FMS_TI_KIND_), intent(inout) :: time_data | ||
logical, intent(in), optional :: verbose | ||
|
||
integer :: t1, t2 | ||
integer :: i1, i2, mod_time | ||
integer :: yy, mm, dd, hh, min, ss | ||
character(len=256) :: err_msg, filename | ||
type(time_type) :: time1, time2 | ||
integer :: dims1(4), dims2(4) | ||
|
||
real(FMS_TI_KIND_) :: w1,w2 | ||
logical :: verb | ||
integer, parameter :: kindl = FMS_TI_KIND_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These variables need doxygen-style documentation
Is it possible to add a unit test to test the functionality? |
time1 = loaded_fields(index1)%time(t1) | ||
time2 = loaded_fields(index2)%time(t2) | ||
w2= (time - time1) // (time2 - time1) | ||
w1 = 1.0-w2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.0_kindl
time1 = loaded_fields(index1)%time(t1) | ||
time2 = loaded_fields(index2)%time(t2) | ||
w2= (time - time1) // (time2 - time1) | ||
w1 = 1.0-w2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.0_kindl
…ile_updated correction from PR review
@thomas-robinson @mlee03 I think the changes address most of your concerns. Additionally:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your updates. I think this code is much easier to read now.
endif if_time2 | ||
else ! standard behavior | ||
if ((time<first_record) .or. (time>last_record)) then | ||
call mpp_error(WARNING, & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This WARNING is swamping the stdout for regression tests that were/are working fine. There is something wrong with this logic.
Description
This updates enables using multiple data source files for data_override for one field. This is necessary for example when the data is contained in annual files (JRA or CORE datasets) and model time is at the start or end of one model year. Data_override needs the data at the end or start of the adjacent year to interpolate. Currently this is a limitation of data_override and the source data have to be specially prepared for GFDL models by a process called padding (prepend the last few time levels of the previous year and append the first few levels of the next year).
This updates enables data_override to work without such padding by specifying multiple files in the data_table like:
where JRA_psl_ym1.nc is data for previous model year
JRA_psl_yy1.nc is data for current model year
JRA_psl_yp1.nc is data for next model year
How Has This Been Tested?
This has been tested in ocean-ice models with JRA dataset and reproduces same answers.
Checklist:
make distcheck
passes