Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataContainer Functions #10

Open
emilyhcliu opened this issue Jun 29, 2024 · 8 comments
Open

DataContainer Functions #10

emilyhcliu opened this issue Jun 29, 2024 · 8 comments
Assignees

Comments

@emilyhcliu
Copy link
Collaborator

@rmclaren I need your suggestion for the following data conversion case:

For IASI data, there are three BUFR sources, each with its mapping file, but they all map to the same IODA variable fields.

container1  mtiasi bufr - dimensions: [Location1, Channel1] --- Channel1 = 616 (This is the main data set)
container2  esiasi bufr - dimensions: [Location2, Channel2] --- Channel2 = 500
container3  iasidb bufr - dimensions: [Location3, Channel3] --- Channel3 = 500

Three containers need to be combined into one container (container = container1 + container2 + container3)
The target container dimension for 2D variables is [Location1+Location2+Locaion3, 616)

So, the variables in container2 & container3, which have channels in the dimension, will need to be reorganized (from 500 to 616)
The variables that need to be modified are: variables/sensorChannelNumner and variables/spectralRadiance

For container2, I have modified these two variables so that their dimensions changed from [Location2, 500] to [Location2, 616]
For container3, I have modified these two variables so that their dimensions changed from [Location3, 500 to [Location3, 616]

I tried DataContainer.replace:

       container2.replace('variables/sensorChannelNumber', channum_new, [category])
       container2.replace('variables/radiance', radiance_new, [category])

I got the following error message:

Exception: 	Reason:	Python error: RuntimeError: Bad parameter: ERROR: Dimension mismatch.

At:
  bufr2ioda_iasi.py(78): create_obs_group

	source_column:	0
	source_filename:	/scratch1/NCEPDEV/da/Emily.Liu/JEDI-ioda/ioda-bundle/ioda/src/engines/ioda/src/ioda/Engines/Script/Script.cpp
	source_function:	ioda::ObsGroup ioda::Engines::Script::openFile(const ioda::Engines::Script::Script_Parameters &, ioda::Group)
	source_line:	278

The error message was expected since I was trying to add data with dimension [Location2, 616] to a data path with dimension [Location2, 500].

@rmclaren Do you have any suggestions?

@rmclaren
Copy link
Collaborator

rmclaren commented Jul 1, 2024

@emilyhcliu This is a thornier issue than I thought originally. I'm trying to think of a good way to merge data like this in order to keep things consistent. We can talk about the details later.

@emilyhcliu
Copy link
Collaborator Author

@rmclaren

One question about all_sub_categories()

yaml_path = './bufr2ioda_mtiasi_mapping.yaml'
input_path= './gdas.t00z.esiasi.tm00.bufr_d'
container = bufr.Parser(input_path, yaml_path).parse()
categories = container.all_sub_categories()
print(categories)

There are three categories for IASI: metop-a, metop-b and ,metop-c

I expect that the following output from print(categories)

[ 'metop-a', 'metop-b', 'metop-c']

But, I got the following:

[['metop-a'], ['metop-b'], ['metop-c']]

Why do we get lists inside of a list?

@rmclaren
Copy link
Collaborator

rmclaren commented Jul 4, 2024

This is because you can categorize (split) on several parameters. So for example you could define two splits, satellite ID and hour. In which case you would get a list
[['metop-a', '2'], ['metop-a', '3'], ['metop-a', '4'], ['metop-b', '2'], ['metop-b', '3'], ['metop-b', '4'], etc...]. So 'metop-a' and '2' are subcatgories. Sets of subcategories ex:['metop-a', '2'] make a category.

@emilyhcliu
Copy link
Collaborator Author

@rmclaren
Can we add functionality so that users can request a sub-container from a container with categories?
For example, my mapping has categories (e.g. goes-16, goes-17, goes-18)
container,all_sub_categories = [['goes-16'], ['goes-17'], ['goes-18']]

Can we have some method to break down the container like the following:
container1 = container('goes-16')
container2 = container('goes-17')
container3 = container('goes-18')

@emilyhcliu
Copy link
Collaborator Author

emilyhcliu commented Jul 7, 2024

Good news.
I tested creating multiple obs spaces from one bufr file for satwind using script backend and data cache.
It worked great!!

For the satwind case, the BUFR contains g16 and g17. But, my mapping file defines three categories (g16, g17, g18). So, the output g18 should be empty with headers only. This also worked!!

So, I am closing issue #6 - about creating empty data file with header only for categories in BUFR but defined in the mapping file. Now, I realize that it is a good thing to create an empty data file with headers only under the circumstances described in the issue.

@rmclaren
Copy link
Collaborator

rmclaren commented Jul 9, 2024

@emilyhcliu Working on extending the DataContainer with a function to get the category data as you suggested..

@rmclaren
Copy link
Collaborator

rmclaren commented Jul 9, 2024

@emilyhcliu Added functions to feature/data_container_split branch. Here is a sample:

def test_highlevel_subcontainer():
    DATA_PATH = 'testinput/data/gdas.t12z.1bamua.tm00.bufr_d'
    YAML_PATH = 'testinput/bufrtest_amua_ta_mapping.yaml'

    container = bufr.Parser(DATA_PATH, YAML_PATH).parse()
    subcontainer = container.get_sub_container(('metop-a',))

    assert np.allclose(subcontainer.get('variables/antennaTemperature'),
                       container.get('variables/antennaTemperature', ('metop-a',)))

@rmclaren
Copy link
Collaborator

The way to do the combining of the different IASI data would be to create a new DataContainer and then to add the data from the 3 cases manually. Basically resize the smaller arrays (the ones with less channels) to the larger size, or trim the larger one down to the smaller sized arrays. Make sure that the channels line up though (wavelengths for each channel for each data source line up) when you do this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@rmclaren @emilyhcliu and others