smgdatatools stands for data tools from SantanderMetGroup.
Generate virtual datasets for climate data. Supports Kerchunk, HDF5 VDS and NcML.
See a description of the dataset here.
echo 'https://s3.amazonaws.com/era5-pds/2020/01/data/air_pressure_at_mean_sea_level.nc
https://s3.amazonaws.com/era5-pds/2020/02/data/air_pressure_at_mean_sea_level.nc
https://s3.amazonaws.com/era5-pds/2020/01/data/sea_surface_temperature.nc
https://s3.amazonaws.com/era5-pds/2020/02/data/sea_surface_temperature.nc' | \
etl.py --db test.sqlite --collector hdf5chunk --hdf5-driver ros3 --aggregations air_pressure_at_mean_sea_level sea_surface_temperature --etl jinja -t era5-s3.json.j2 --dest test.json
You need to remove the last comma from the test.json
file!
import xarray
ds = xarray.open_dataset("reference://", engine="zarr", backend_kwargs={
"consolidated": False,
"storage_options": {"fo": 'test.json', "remote_protocol": "s3","remote_options": {"anon": True}}
})
print(ds)
See a description of the dataset here.
echo 'gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/tas/gn/v20200226
gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/tas/gn/v20191120
gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r2i1p1f1/Amon/pr/gn/v20200226
gs://cmip6/CMIP6/CMIP/NCAR/CESM2-FV2/historical/r1i1p1f1/Amon/pr/gn/v20191120' | \
etl.py --db test.sqlite --collector zarr --aggregations tas pr --etl jinja -t gcs-cmip6.json.j2 --dest test.json
You need to remove the last comma from the test.json
file!
import xarray
ds = xarray.open_dataset("reference://", engine="zarr", backend_kwargs={
"consolidated": False,
"storage_options": {"fo": 'test.json', "remote_protocol": "gs","remote_options": {"anon": True}}
})
print(ds)
Be careful with the following:
- Number of chunks does not match between ensemble members for the same variable. Check this against the SQL database (eg.
select count(*) from variable inner join chunk on variable.id = chunk.variable_id where variable.name = VARIABLE_NAME group by variable.id
).
find test/data -maxdepth 1 -type f -name '*.nc' | grep -v 'fx' | etl.py --db test.sqlite --collector nc --aggregations tas pr --etl new-common --dest test.h5 --coord-name variant_label --coord-values-attr variant_label
Open the virtual dataset with xarray:
import xarray
ds = xarray.open_dataset("test.h5")
ds[["tas", "pr"]].mean()
find test/data -maxdepth 1 -type f -name '*.nc' | grep -v 'fx' | etl.py --db test.sqlite --collector nc --aggregations tas pr --etl jinja -t time-ensemble.ncml.j2 --dest test.ncml
Open the generated XML file with your favourite editor. You may also use ToolsUI or climate4R.