-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for Preston provenance tracking / archives #52
Comments
@seltmann suggested to take https://zenodo.org/record/7194486 [1] and translate them into a "clonable" publication, so that you don't have to click a hundred of times to download load the different versions. References[1] Poelen, Jorrit H., Seltmann, Katja C., Campbell, Mariel, Orlofske, Sarah A., Light, Jessica E., Tucker, Erika M., Demboski, John R, McElrath, Tommy, Grinter, Christopher C, Diaz-Bastin, Rachel, Bush, Sarah E, Delapena, Robin, Cook, Joseph, Gall, Lawrence F., Whiting, Michael F, Clark, Shawn M, Cameron, Stephen L, Replogle, Charla R, Rund, Samuel S.C., … Bailey, Colin. (2022). Terrestrial Parasite Tracker indexed biotic interactions and review summary (0.7) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7194486 |
With addition of:
and
a first bridge to Preston is built from Elton Example 1 - Calculate size of dataset resources.
yielding:
Example 2 - Retrieve resource assets via software heritageMany GloBI index configurations are kept as GitHub repositories. These repositories can be (and often are) tracked by the Software Heritage Library https://softwareheritage.org . Preston supports Software Heritage as a content remote. So, you can retrieve content logged by elton via the software heritage library if available. # clone a species interaction dataset
elton pull globalbioticinteractions/template-dataset
# log resources associated with the species interaction dataset
#, selecting globi.json resource
# and retrieving them via software heritage
elton log globalbioticinteractions/template-dataset\
| grep globi.json\
| preston cat --remote https://softwareheritage.org yields:
Example 3 . Package Elton dataset as a Preston archive
yields attached with preston head hash://sha256/3cb3ac31bbb057d97090cfd067b531c343b7de77ba7a7226281072addf308a18 |
fyi @mielliott @zedomel - I just connected elton and preston with a creaky integration bridge via |
…aths in prov logs; related to globalbioticinteractions/globalbioticinteractions#1030 #52
with v0.14.2, you can stream versions of the GIB corpus (GBIF, iDigBio, BioCASe) into elton without have to keep the entire thing (unless you want to). Also see https://github.com/globalbioticinteractions/elton/releases/tag/0.14.2 . Example 1. Extract all interaction claims found GIB (GBIF, iDigBio, BioCASe, see https://linker.bio#use-case-3-studying-pine-pests-caused-by-weevils-curculionoidea ) corpus as seen on 2024-04-01 and described by https://linker.bio/hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
Example 2. Review interaction claims found GIB (GBIF, iDigBio, BioCASe, see https://linker.bio#use-case-3-studying-pine-pests-caused-by-weevils-curculionoidea ) corpus as seen on 2024-04-01 and described by https://linker.bio/hash://sha256/37bdd8ddb12df4ee02978ca59b695afd651f94398c0fe2e1f8b182849a876bb2
Note that both Example 1 and example 2 streams content provided by https://linker.bio/ . If you'd like to keep the content (>>GiB), remove the --no-cache option and you'll have a copy of a large corpus of biodiversity data available for reproducible offline processing after an initial "sync/pull" from https://linker.bio/ . |
Currently, Elton is using a built-in provenance tracking of biodiversity datasets.
Suggested is to add support for Preston archives / provenance tracking.
The text was updated successfully, but these errors were encountered: