Add functionality to download small and large example datasets #56

cchwala · 2024-06-12T08:31:00Z

Is your feature request related to a problem? Please describe.

Currently we either use curl -OL some_url to download example data in the notebooks or we use data from the tests/test_data directory. Both are not ideal, in particular not if we want to add more datasets and also make access to some large ones easy.

Describe the solution you'd like A clear and concise description of what you
want to happen.

I suggest to add a module src/poligrain/data.py or example_data.py or maybe datasets.py which would contain functions that could be used like this: get_example_data('AMS PWS small') or get_example_data('AMS PWS full').

As a first step, we would add very small example datasets (max. some hundred KB) to the repo that would be accessed by the function. Later we could add a separate repo for small example data.

In addition we can download full datasets directly from zenodod. Some data transformation would be required, though. There was preliminary work on this e.g. in the ragali_prototpy here (which is more or less 100% copy-paste of the same code in the OPENSENSE software sandbox, if I recall correctly). Hence, I would leave the access to full datasets for later.

Describe alternatives you've considered A clear and concise description of
any alternative solutions or features you've considered.

Just continue adding data to the repo... ;-). But since we are building a package that should serve as the basis for other packages it should not be bloated by larger binary data that is packaged into the pypi installs. If it is not package into the pypi install it does not help the user much, since our examples are not easily executable.

The text was updated successfully, but these errors were encountered:

cchwala · 2024-06-13T07:23:15Z

When the proposed implementation is done, the existing example data in the repo has to be consolidated. There is data in the tests dir and I am about to add new example data in the notebooks dir in #41 because I do not want to hardcode the paths for getting data from the tests dir.

cchwala added the enhancement New feature or request label Jun 12, 2024

cchwala changed the title ~~Allow to download small and large example datasets~~ Add functionality to download small and large example datasets Jun 12, 2024

cchwala mentioned this issue Oct 18, 2024

Add function to download small example datasets #76

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functionality to download small and large example datasets #56

Add functionality to download small and large example datasets #56

cchwala commented Jun 12, 2024

cchwala commented Jun 13, 2024

Add functionality to download small and large example datasets #56

Add functionality to download small and large example datasets #56

Comments

cchwala commented Jun 12, 2024

cchwala commented Jun 13, 2024