You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently we either use curl -OL some_url to download example data in the notebooks or we use data from the tests/test_data directory. Both are not ideal, in particular not if we want to add more datasets and also make access to some large ones easy.
Describe the solution you'd like A clear and concise description of what you
want to happen.
I suggest to add a module src/poligrain/data.py or example_data.py or maybe datasets.py which would contain functions that could be used like this: get_example_data('AMS PWS small') or get_example_data('AMS PWS full').
As a first step, we would add very small example datasets (max. some hundred KB) to the repo that would be accessed by the function. Later we could add a separate repo for small example data.
In addition we can download full datasets directly from zenodod. Some data transformation would be required, though. There was preliminary work on this e.g. in the ragali_prototpyhere (which is more or less 100% copy-paste of the same code in the OPENSENSE software sandbox, if I recall correctly). Hence, I would leave the access to full datasets for later.
Describe alternatives you've considered A clear and concise description of
any alternative solutions or features you've considered.
Just continue adding data to the repo... ;-). But since we are building a package that should serve as the basis for other packages it should not be bloated by larger binary data that is packaged into the pypi installs. If it is not package into the pypi install it does not help the user much, since our examples are not easily executable.
The text was updated successfully, but these errors were encountered:
cchwala
changed the title
Allow to download small and large example datasets
Add functionality to download small and large example datasets
Jun 12, 2024
When the proposed implementation is done, the existing example data in the repo has to be consolidated. There is data in the tests dir and I am about to add new example data in the notebooks dir in #41 because I do not want to hardcode the paths for getting data from the tests dir.
Is your feature request related to a problem? Please describe.
Currently we either use
curl -OL some_url
to download example data in the notebooks or we use data from thetests/test_data
directory. Both are not ideal, in particular not if we want to add more datasets and also make access to some large ones easy.Describe the solution you'd like A clear and concise description of what you
want to happen.
I suggest to add a module
src/poligrain/data.py
orexample_data.py
or maybedatasets.py
which would contain functions that could be used like this:get_example_data('AMS PWS small') or
get_example_data('AMS PWS full').As a first step, we would add very small example datasets (max. some hundred KB) to the repo that would be accessed by the function. Later we could add a separate repo for small example data.
In addition we can download full datasets directly from zenodod. Some data transformation would be required, though. There was preliminary work on this e.g. in the
ragali_prototpy
here (which is more or less 100% copy-paste of the same code in the OPENSENSE software sandbox, if I recall correctly). Hence, I would leave the access to full datasets for later.Describe alternatives you've considered A clear and concise description of
any alternative solutions or features you've considered.
Just continue adding data to the repo... ;-). But since we are building a package that should serve as the basis for other packages it should not be bloated by larger binary data that is packaged into the pypi installs. If it is not package into the pypi install it does not help the user much, since our examples are not easily executable.
The text was updated successfully, but these errors were encountered: