Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/issue 1365 #1367

Closed
wants to merge 8 commits into from
Closed

Feature/issue 1365 #1367

wants to merge 8 commits into from

Conversation

nlenssen2013
Copy link
Collaborator

Allow the code to check the root group and keep going into the layers of the file to find the lat/lon variables. Do not fail the test right away if the lat/lon variables are not found. Only fail until all variables are checked.
Verify that the cf.xarray get lat/lon functionality actually gets a variable that is lat or latitude. SNDR has fol_lon and lon variables. The original functionality was incorrectly getting the fol_lon variable.

Copy link
Contributor

github-actions bot commented Nov 2, 2023

Test Results for

0 tests   0 ✔️  0s ⏱️
0 suites  0 💤
0 files    0

Results for commit 8e5af3d.

♻️ This comment has been updated with latest results.

Comment on lines 324 to 342
ds = xarray.open_dataset(subsetted_filepath, group='')
lat_var_name = None
if len(ds.variables):
subsetted_ds = ds
lat_var_name, lon_var_name = get_lat_lon_var_names(subsetted_ds, collection_variables)
if lat_var_name:
ds.close()
pass
else:
for g in f.groups:
ds = xarray.open_dataset(subsetted_filepath, group=g)
if len(ds.variables):
group = g
subsetted_ds = ds
lat_var_name, lon_var_name = get_lat_lon_var_names(subsetted_ds, collection_variables)

else:
ds.close()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the logic here. Is the idea that we should be looping through each group in the dataset trying to find lat/lon in each group and then stop when we find a group that has lat/lon variables?
Feels like it could be simplified a little bit, as it is there's a lot of if/else branching that is hard to follow.

Suggested change
ds = xarray.open_dataset(subsetted_filepath, group='')
lat_var_name = None
if len(ds.variables):
subsetted_ds = ds
lat_var_name, lon_var_name = get_lat_lon_var_names(subsetted_ds, collection_variables)
if lat_var_name:
ds.close()
pass
else:
for g in f.groups:
ds = xarray.open_dataset(subsetted_filepath, group=g)
if len(ds.variables):
group = g
subsetted_ds = ds
lat_var_name, lon_var_name = get_lat_lon_var_names(subsetted_ds, collection_variables)
else:
ds.close()
# Find the first group that has both latitude and longitude variables
lat_var_name, lon_var_name = None, None
for g in f.groups:
# Open group as dataset
ds_group = xarray.open_dataset(subsetted_filepath, group=g)
# If group has no variables, skip it
if len(ds_group.variables) == 0:
ds_group.close()
continue
lat_var_name, lon_var_name = get_lat_lon_var_names(ds_group, collection_variables)
if lat_var_name and lon_var_name:
# Found lat/lon, stop looking and leave group open for reading
group = g
subsetted_ds = ds_group
break
else:
# Group does not contain lat/lon. Close group and try next one
ds.close()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the logic here. Is the idea that we should be looping through each group in the dataset trying to find lat/lon in each group and then stop when we find a group that has lat/lon variables? Feels like it could be simplified a little bit, as it is there's a lot of if/else branching that is hard to follow.

Yeah, I'm trying to keep going into the file if it's not found in the root group - and then to stop when it finds the lat and lon variables. Currently in the process of restructuring to make simpler - I'm open to ideas/suggestions

@@ -239,7 +240,19 @@ def get_lat_lon_var_names(dataset: xarray.Dataset, collection_variable_list: Lis

# If that doesn't work, try using cf-xarray to infer lat/lon variable names
try:
return dataset.cf.coordinates['latitude'][0], dataset.cf.coordinates['longitude'][0]
for lat in dataset.cf.coordinates['latitude']:
if lat in ['lat', 'latitude']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're only going to consider a specific list of strings as valid lat/lon coordinate names, let's extract that list into a constant so it's obvious and easier to add to later. Something like this up at the top of the file:

VALID_LATITUDE_VARIABLE_NAMES = ['lat', 'latitude']
VALID_LONGITUDE_VARIABLE_NAMES = ['lon', 'longitude']

Alternatively we could keep a list of invalid names/variable names we want to ignore and invert the logic

IGNORED_LATITUDE_VARIABLE_NAMES = ['fol_lat']
IGNORED_LONGITUDE_VARIABLE_NAMES = ['fol_lon']

...

if lat not in IGNORED_LATITUDE_VARIABLE_NAMES:
    ...

if lon not in IGNORED_LONGITUDE_VARIABLE_NAMES:
    ...

@@ -267,8 +280,8 @@ def get_lat_lon_var_names(dataset: xarray.Dataset, collection_variable_list: Lis
logging.warning("Unable to find lat/lon vars using 'units' attribute")

# Out of options, fail the test because we couldn't determine lat/lon variables
pytest.fail(f"Unable to find latitude and longitude variables.")

logging.warning(f"Unable to find latitude and longitude variables in this group.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the group name as an optional parameter to the function so we can log the name of the group here to help with debugging?

@jamesfwood
Copy link
Contributor

jamesfwood commented Feb 8, 2024

Hi @nlenssen2013 Can you tell me the status of this? Are you still wanting to merge this? Seems it has some conflicts to fix before merging. Did you address all of Frank's concerns?

@nlenssen2013
Copy link
Collaborator Author

Hi @nlenssen2013 Can you tell me the status of this? Are you still wanting to merge this? Seems it has some conflicts to fix before merging. Did you address all of Frank's concerns?

I'm planning on closing this PR - the fix was a little different that this issue. Check #2168 and I'll close this one if you can.

@nlenssen2013 nlenssen2013 deleted the feature/issue-1365 branch February 12, 2024 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants