Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to download small and large example datasets #56

Open
cchwala opened this issue Jun 12, 2024 · 1 comment
Open

Add functionality to download small and large example datasets #56

cchwala opened this issue Jun 12, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@cchwala
Copy link
Member

cchwala commented Jun 12, 2024

Is your feature request related to a problem? Please describe.

Currently we either use curl -OL some_url to download example data in the notebooks or we use data from the tests/test_data directory. Both are not ideal, in particular not if we want to add more datasets and also make access to some large ones easy.

Describe the solution you'd like A clear and concise description of what you
want to happen.

I suggest to add a module src/poligrain/data.py or example_data.py or maybe datasets.py which would contain functions that could be used like this: get_example_data('AMS PWS small') or get_example_data('AMS PWS full').

As a first step, we would add very small example datasets (max. some hundred KB) to the repo that would be accessed by the function. Later we could add a separate repo for small example data.

In addition we can download full datasets directly from zenodod. Some data transformation would be required, though. There was preliminary work on this e.g. in the ragali_prototpy here (which is more or less 100% copy-paste of the same code in the OPENSENSE software sandbox, if I recall correctly). Hence, I would leave the access to full datasets for later.

Describe alternatives you've considered A clear and concise description of
any alternative solutions or features you've considered.

Just continue adding data to the repo... ;-). But since we are building a package that should serve as the basis for other packages it should not be bloated by larger binary data that is packaged into the pypi installs. If it is not package into the pypi install it does not help the user much, since our examples are not easily executable.

@cchwala cchwala added the enhancement New feature or request label Jun 12, 2024
@cchwala cchwala changed the title Allow to download small and large example datasets Add functionality to download small and large example datasets Jun 12, 2024
@cchwala
Copy link
Member Author

cchwala commented Jun 13, 2024

When the proposed implementation is done, the existing example data in the repo has to be consolidated. There is data in the tests dir and I am about to add new example data in the notebooks dir in #41 because I do not want to hardcode the paths for getting data from the tests dir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant