-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/issue 1365 #1367
Feature/issue 1365 #1367
Conversation
tests/verify_collection.py
Outdated
ds = xarray.open_dataset(subsetted_filepath, group='') | ||
lat_var_name = None | ||
if len(ds.variables): | ||
subsetted_ds = ds | ||
lat_var_name, lon_var_name = get_lat_lon_var_names(subsetted_ds, collection_variables) | ||
if lat_var_name: | ||
ds.close() | ||
pass | ||
else: | ||
for g in f.groups: | ||
ds = xarray.open_dataset(subsetted_filepath, group=g) | ||
if len(ds.variables): | ||
group = g | ||
subsetted_ds = ds | ||
lat_var_name, lon_var_name = get_lat_lon_var_names(subsetted_ds, collection_variables) | ||
|
||
else: | ||
ds.close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the logic here. Is the idea that we should be looping through each group in the dataset trying to find lat/lon in each group and then stop when we find a group that has lat/lon variables?
Feels like it could be simplified a little bit, as it is there's a lot of if/else branching that is hard to follow.
ds = xarray.open_dataset(subsetted_filepath, group='') | |
lat_var_name = None | |
if len(ds.variables): | |
subsetted_ds = ds | |
lat_var_name, lon_var_name = get_lat_lon_var_names(subsetted_ds, collection_variables) | |
if lat_var_name: | |
ds.close() | |
pass | |
else: | |
for g in f.groups: | |
ds = xarray.open_dataset(subsetted_filepath, group=g) | |
if len(ds.variables): | |
group = g | |
subsetted_ds = ds | |
lat_var_name, lon_var_name = get_lat_lon_var_names(subsetted_ds, collection_variables) | |
else: | |
ds.close() | |
# Find the first group that has both latitude and longitude variables | |
lat_var_name, lon_var_name = None, None | |
for g in f.groups: | |
# Open group as dataset | |
ds_group = xarray.open_dataset(subsetted_filepath, group=g) | |
# If group has no variables, skip it | |
if len(ds_group.variables) == 0: | |
ds_group.close() | |
continue | |
lat_var_name, lon_var_name = get_lat_lon_var_names(ds_group, collection_variables) | |
if lat_var_name and lon_var_name: | |
# Found lat/lon, stop looking and leave group open for reading | |
group = g | |
subsetted_ds = ds_group | |
break | |
else: | |
# Group does not contain lat/lon. Close group and try next one | |
ds.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the logic here. Is the idea that we should be looping through each group in the dataset trying to find lat/lon in each group and then stop when we find a group that has lat/lon variables? Feels like it could be simplified a little bit, as it is there's a lot of if/else branching that is hard to follow.
Yeah, I'm trying to keep going into the file if it's not found in the root group - and then to stop when it finds the lat and lon variables. Currently in the process of restructuring to make simpler - I'm open to ideas/suggestions
tests/verify_collection.py
Outdated
@@ -239,7 +240,19 @@ def get_lat_lon_var_names(dataset: xarray.Dataset, collection_variable_list: Lis | |||
|
|||
# If that doesn't work, try using cf-xarray to infer lat/lon variable names | |||
try: | |||
return dataset.cf.coordinates['latitude'][0], dataset.cf.coordinates['longitude'][0] | |||
for lat in dataset.cf.coordinates['latitude']: | |||
if lat in ['lat', 'latitude']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're only going to consider a specific list of strings as valid lat/lon coordinate names, let's extract that list into a constant so it's obvious and easier to add to later. Something like this up at the top of the file:
VALID_LATITUDE_VARIABLE_NAMES = ['lat', 'latitude']
VALID_LONGITUDE_VARIABLE_NAMES = ['lon', 'longitude']
Alternatively we could keep a list of invalid names/variable names we want to ignore and invert the logic
IGNORED_LATITUDE_VARIABLE_NAMES = ['fol_lat']
IGNORED_LONGITUDE_VARIABLE_NAMES = ['fol_lon']
...
if lat not in IGNORED_LATITUDE_VARIABLE_NAMES:
...
if lon not in IGNORED_LONGITUDE_VARIABLE_NAMES:
...
tests/verify_collection.py
Outdated
@@ -267,8 +280,8 @@ def get_lat_lon_var_names(dataset: xarray.Dataset, collection_variable_list: Lis | |||
logging.warning("Unable to find lat/lon vars using 'units' attribute") | |||
|
|||
# Out of options, fail the test because we couldn't determine lat/lon variables | |||
pytest.fail(f"Unable to find latitude and longitude variables.") | |||
|
|||
logging.warning(f"Unable to find latitude and longitude variables in this group.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the group name as an optional parameter to the function so we can log the name of the group here to help with debugging?
Hi @nlenssen2013 Can you tell me the status of this? Are you still wanting to merge this? Seems it has some conflicts to fix before merging. Did you address all of Frank's concerns? |
I'm planning on closing this PR - the fix was a little different that this issue. Check #2168 and I'll close this one if you can. |
Allow the code to check the root group and keep going into the layers of the file to find the lat/lon variables. Do not fail the test right away if the lat/lon variables are not found. Only fail until all variables are checked.
Verify that the cf.xarray get lat/lon functionality actually gets a variable that is lat or latitude. SNDR has fol_lon and lon variables. The original functionality was incorrectly getting the fol_lon variable.